United We Stand, Divided We Fall:
UnityGraph for Unsupervised Procedure Learning from Videos

Siddhant Bansal        Chetan Arora        C.V. Jawahar       

WACV 2024

Paper        Video        Poster       

What is Procedure Learning?

Given multiple videos of a task, the goal is to identify the key-steps and their order to perform the task.

 

Provided multiple videos of making a pizza, the goal is to identify the steps required to prepare the pizza and their order.

Graph-based Procedure Learning (GPL)

We propose the Graph-based Procedure Learning (GPL) framework. Contrary to existing graph-based frameworks, GPL does not require node or edge annotations, enabling unsupervised procedure learning.

 

 

Graph-based Procedure Learning (GPL) framework. Given multiple videos of the same task, we create UnityGraph. Using the Node2Vec algorithm, we exploit the structure of UnityGraph to enhance the node embeddings in an unsupervised manner. For example, the temporal and spatial clips that were originally far in the embedding space are closer after Node2Vec (highlighted in blue). Finally, we cluster the embeddings using KMeans and filter the background frames to obtain the key-steps required to perform the task.

 

Paper

Code

Coming Soon!

Acknowledgements

The work was supported in part by the Department of Science and Technology, Government of India, under DST/ICPS/Data-Science project ID T-138. The authors thank Makarand Tapaswi and Charu Sharma for their Topics in Deep Learning course which motivated the paper’s central idea.

 

Please consider citing if you make use of the work:

@InProceedings{UnityGraphWACV2022,
author="Bansal, Siddhant
and Arora, Chetan
and Jawahar, C.V.",
title="United We Stand, Divided We Fall: UnityGraph for Unsupervised Procedure Learning from Videos",
booktitle = "Winter Conference on Applications of Computer Vision (WACV)",
year="2024"
}