Siddhant Bansal

I am an MS by Research student at CVIT, IIIT Hyderabad. I am advised by Dr C.V. Jawahar on creating an OCR (Optical Character Reader) for Indian languages (Hindi, Tamil, and Telugu). My current research focuses on improving word retrieval and recognition in a large document corpus. I am broadly interested in 2D and 3D Computer Vision, Deep Learning and related problems.

My ultimate goal is to contribute to the development of machines capable of reading an instruction manual and creating new machines! I'm a very inquisitive person and always willing to learn about fields including, but not limited to, science, technology, astrophysics, and physics.

Checkout my Blog!

Email  /  CV  /  Google Scholar  /  Github  /  LinkedIn  /  arXiv  /  ORCID

Navigate to:    Updates  /  Research  /  Projects  /  Publications

My Photo!

Research Experience
Word recognition results

Research Fellow
CVIT, IIIT-Hyderabad

August 2019 - January 2020 (Hyderabad, Telangana)

Project page  Poster  Demo  Code  Paper [DAS2020 (Oral Presentation)]

Worked on improving word recognition and retrieval in large document collections for Indian scripts like Hindi, Telugu and Tamil. This work was supervised by Dr Praveen Krishnan and Dr C.V. Jawahar.

Major contributions:

  1. Accomplished improved word accuracy by 1.4% for Hindi and 1.8% for Telugu by merging text hypotheses and deep embeddings.
  2. Proposed techniques like Naive Merge, Query Expansion for improving word retrieval by 11.12% for the Hindi language.

Created an OCR for Hindi, Tamil and Telugu. Worked on a novel semi-supervised training technique for Convolutional Recurrent Neural Network (using CTC loss). Reported an improved word accuracy by 2.5% and character accuracy by 5%.

ICP on Chairs GIF

Research Intern
IIT Gandhinagar

March 2019 - August 2019 (Gandhinagar, Gujarat)


Worked on the project titled "Cultural Heritage Preservation and Restoration using Digital 3D Models", under Prof. Shanmuganathan Raman. The project was supported by NVIDIA and IMPRINT (Impacting Research Innovation and Technology) an initiative of the Government of India.

Major work done:

  1. Data Collection in the form of Point Clouds using Faro Focus 3D Laser Scanner.
  2. Point Cloud Alignment using algorithms like ICP (Using Eigenvalues Eigenvectors, SVD, and studied various deep learning approaches like Deep Closest Point, DeepICP, Discriminative Optimization, Auto-Encoder Approach, PointNetLK).
  3. Developed an algorithm for Point Cloud Completion using Fully-Connected Auto-Encoder and got some decent results on ShapeNet dataset.

ELOPE Flow Chart

Artificial Intelligence Intern
Meditab Software, Inc.

September 2018 - March 2019 (Ahmedabad, Gujarat)

Website Paper

Worked on the project titled "Facility Layout Optimization using Genetic Algorithm".

Major work done:

  1. Successful in generating optimal facility layouts, by implementing ELOPE (Evolutionary Layout Optimization and Evaluator) and using it with the Genetic Algorithm, this led to a decrease in travelling time by 75% for the DosePacker robots leading to more efficient DosePacker system.
  2. Created an automatic log file analyzer capable of predicting a possible machine breakdown, leading to a 70% decrease in maintenance time of the DosePacker system and saving on maintenance costs.

Foot Images Samples

Artificial Intelligence Research Intern
Bennett University

June 2018 - July 2018 (Greater Noida, Uttar Pradesh)

Website  YouTube

Worked on the project titled "Credibility Examination of Human Footprint Using Minutiae Features". The project was supported by NVIDIA by providing DGX 1 Tesla V100.

Major work done:

  1. Collected dataset of footprints from 180 volunteers, using a paper scanner at 600dpi.
  2. Developed a custom Convolution Neural Network for classifying humans based on the shape and size of their footprints. The network was trained on the data collected earlier.
Bioscan Device

Data Analyst Intern
Bioscan Research

April 2018 - June 2018 (Ahmedabad, Gujarat)

Worked on applying Artificial Intelligence and Machine Learning to an onsite detection tool for instantaneous scanning of intracranial bleeding.

Major work done:

  1. Created software capable of tracking patients using Python and SQLite, leading to better workflow for the people working on collecting the brain scans.
  2. Successful in detecting actual signal amidst noise (coming from a brain scan using a near-infrared laser scanner), by implementing an automatic signal extractor using Python.


Automatic Garbage Detection and Collection
Paper  Website

Task was to come up with a device capable of detecting garbage and automatically picking it up.

  • Lead a team of three and developed a system capable of detecting waste bottles using CNN (MobileNets).
  • Developed an algorithm for getting a rough estimation of the depth of the garbage (with an error margin of 2cms).
  • Developed a path planning algorithm based on the concept of PID (by considering bottle as the centre).
  • Codes developed were efficient enough to be run on a RaspberryPi.
  • One of the 4% projects selected for demonstration at SSIP annual conference.

Dad Smiling!

Smile Detector

Created an end-to-end system for detecting smiling faces in a live video stream using Convolutional Neural Network.

Self driving car screenshot!

Self Driving Car
GitHub  YouTube

Learned about Deep Q Learning by implementing it for driving a car autonomously.

Anime sample from the dataset.

Anime Classification

In this project, I worked on autoencoders to learn the features from 1,40,000 images. Then using the trained autoencoder with added convolution layers to classify the anime to answer various questions with 74.6% accuracy like:

  1. Does the image contain any nudity or sexual content? (Yes, No)
  2. Is this an interesting image or not? (Yes, no)


  • Siddhant Bansal, Seema Patel, Ishita Shah, Prof. Alpesh Patel, Prof. Jagruti Makwana, and Dr. Rajesh Thakker. "AGDC: Automatic Garbage Detection and Collection." ArXiv:1908.05849

  • B. Siddhant, P. Krishnan, and C. V. Jawahar, “Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval,” in IAPR International Workshop on Document Analysis Systems (DAS), 2020. ArXiv: 2007.00166