Siddhant’s Scratch Book

Posts

Aug 12, 2022
DBpedia GSoC 2022 (Week 12-13): Winding down the code, documentation, and API
This article summarises my progess in GSoC over weeks twelve and thirteen of the GSoC coding period.
Aug 12, 2022
DBpedia GSoC 2022 (Week 10-11): Website with UPLOAD functionality
This article summarises my progess in GSoC over weeks ten and eleven of the GSoC coding period.
Jul 29, 2022
DBpedia GSoC 2022 (Week 9): Website-based demo for the framework + Mid-eval
This article summarises my progess in GSoC over week nine of the GSoC coding period.
Jul 22, 2022
DBpedia GSoC 2022 (Week 8): Creating the dataset
This article summarises my progess in GSoC over week eight of the GSoC coding period.
Jul 15, 2022
DBpedia GSoC 2022 (Week 7): Using the embeddings to query the Knowledge Graph
This article summarises my progess in GSoC over week seven of the GSoC coding period.
Jul 8, 2022
DBpedia GSoC 2022 (Week 5-6): Visiting CVPR
This article summarises my progess in GSoC over weeks five and six of the GSoC coding period.
Jul 8, 2022
DBpedia GSoC 2022 (Week 3-4): Project Summary and Begin Coding
This article summarises my progess in GSoC over the past two weeks after the community bonding period ended.
Jun 1, 2022
DBpedia GSoC 2022 (Week 1-2): Community Bonding
I am glad to share that my application has been selected for GSoC 2022! I will be contributing to DBpedia Association. Under the mentorship of Edgard Marx, Ashutosh Kumar, and Nausheen Fatma, I will be working on bridging the gap between computer vision and knowledge graphs!
Sep 1, 2020
Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks
This paper proposes a Generative Adversarial Network (GAN) based architecture called Deep Future Gaze (DFG) for addressing the task of gaze anticipation in egocentric videos. DFG takes in a single frame and generates multiple frames; it attempts to anticipate the future gazes in the generated multiple frames. As in the case of other GANs, DFG consists of two networks: Generator (GN) and Discriminator (D). Here, GN is a two-stream architecture (using 3D-CNN) which attempts to untangle the foreground and background to generate the future frames, whereas, D differentiates the synthetic frames generated by GN from the real frames, thereby, helping to improve GN. This enables DFG to perform better than the rest of the state-of-the-art techniques.
Aug 19, 2020
First Person Action Recognition Using Deep Learned Descriptors
This paper proposes a three-stream convolutional neural network architecture for the task of action recognition in first-person videos. The three streams consist of the spatial, temporal, and Ego streams. The Ego stream is a two-stream architecture consisting of 2D and 3D CNN; it takes in hand mask, head motion, and saliency map for generating the class scores. The Ego stream when combined with the spatial and temporal streams, achieves a 10% gain in the action recognition accuracy.
Aug 16, 2020
Two-Stream Convolutional Networks for Action Recognition in Videos
This paper proposes a two-stream convolutional neural network architecture for the task of action recognition in a video. Out of the two streams, the spatial stream uses frames from a video and learns the spatial information, whereas, the temporal stream uses a stack of optical flow images for learning the temporal information. The information from both of these networks is combined to predict the final output. The authors analyze various inputs to the temporal network and use multitask learning for improving the performance of the architecture.
Aug 12, 2020
H+O: Unified Egocentric Recognition and 3D Hand-Object Poses and Interactions
This paper proposes an end-to-end neural network architecture capable of jointly identifying 3D object and hand poses, and at the same time predicting the object and activity category for a RGB image in a single pass. The authors propose a novel representation method for jointly learning the 3D hand and object poses, and object and action categories. They also propose an interaction RNN for learning the interaction between 3D hand and object along the temporal dimension.
Aug 10, 2020
Going Deeper into First-Person Activity Recognition
This paper aims at improving action recognition accuracy in the egocentric videos by using a two-stream Convolutional Neural Network (CNN) architecture. Here, one stream learns the appearance information, whereas the other stream learns the motion information. The two-stream CNN proposed is able to capture the object attributes and hand-object configurations.
Jan 9, 2020
An Intuitive Introduction to Linear Programming
A linear programming problem can be defined as the task of maximizing or minimizing a linear function subject to some linear constraints. The constraints can be equalities or inequalities.
Nov 4, 2019
Decoding Connectionist Temporal Classification
Being able to interpret the output probability matrix from the Convolutional Recurrent Neural Network (CRNN) is an essential task for getting output from the trained network. Various decoding techniques prove to be useful for this task. In this article, we’ll discuss two of those methods.
Oct 19, 2019
Explanation of Connectionist Temporal Classification
Connectionist Temporal Classification (CTC) is a type of Neural Network output helpful in tackling sequence problems like handwriting and speech recognition where the timing varies. Using CTC ensures that one does not need an aligned dataset, which makes the training process more straightforward.