I am an MS by Research candidate at CVIT, IIIT Hyderabad. I’m guided by Prof. C.V. Jawahar and co-guided by Prof. Chetan Arora. My research interest lies in Computer Vision, Pattern Recognition, and Machine Learning. My graduate research focuses on devising learning-based methods for understanding and exploring various aspects of first-person (egocentric) vision. Earlier, I worked on improving word recognition and retrieval in large document collection under the guidance of Prof. C.V. Jawahar. Previously, I worked with Prof. Shanmuganathan Raman on 3D Computer Vision.

My ultimate goal is to contribute to the development of systems capable of understanding the world as we do. I’m an inquisitive person, and I’m always willing to learn about fields including, but not limited to, science, technology, astrophysics, and physics.

CV / Google Scholar / Github / LinkedIn / arXiv / ORCID

News


July, 2020 : Submitted my latest work with Praveen Krishnan and Prof. C.V. Jawahar to ICPR 2020.

April, 2020 : Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval got accepted to DAS 2020!

Jan, 2020 : Joining CVIT, IIIT-Hyderabad as an MS by Research student. I will be advised by Prof. C.V. Jawahar.

Aug, 2019 : Joining as Research Fellow at CVIT, IIIT-Hyderabad under Prof. C.V. Jawahar.

June, 2019 : Completed my B.E. in ECE from Vishwakarma Government Engineering College.

See all news

siddhant.bansal@research.iiit.ac.in
Center for Visual Information and Technology (CVIT), International Institute of Information Technology (IIIT), Hyderabad, India.

Publications


Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval

Fusing recognition-based and recognition-free approaches using rule-based methods for improving word recognition and retrieval. (ORAL)

Siddhant Bansal, Praveen Krishnan , and C.V. Jawahar

IAPR International Workshop on Document Analysis and System (DAS), 2020

PDF / Demo / Project Page / Code (Github) / Poster


See all publications

Blog


Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks

This paper proposes a Generative Adversarial Network (GAN) based architecture called Deep Future Gaze (DFG) for addressing the task of gaze anticipation in egocentric videos.

Link to the article!

First Person Action Recognition Using Deep Learned Descriptors

This paper proposes a three-stream convolutional neural network architecture for the task of action recognition in first-person videos.

Link to the article!

Two-Stream Convolutional Networks for Action Recognition in Videos

This paper proposes a two-stream convolutional neural network architecture for the task of action recognition in a video.

Link to the article!

 

See all articles