An Outlook into the Future of Egocentric Vision

Chiara Plizzari 1*        Gabriele Goletto 1*        Antonino Furnari 2*        Siddhant Bansal 3*       
Francesco Ragusa 2*        Giovanni Maria Farinella 2        Dima Damen 3        Tatiana Tommasi 2       

1Politecnico di Torino, Italy 2University of Catania, Italy 3University of Bristol, UK
*denotes equal contribution

Accepted at IJCV, April 2024

Futuristic Survey Paper [ArXiv]


We envision a wearable device, EgoAI, that enables in-situ multimodal sensing from the wearer's perspective and provides ego-based assistance. The envisaged future takes the shape of five distinctive use cases that are grounded in either a location or occupation. For example, Ego-Designer, Ego-Worker, and Ego-Tourist shown here.

Abstract

What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.

Imagining the Future

Here we showcase the five scenarios in which EgoAI is envisioned to be used. Each scenario is grounded in a specific location or occupation. Click on the left or right arrows to navigate through the scenarios OR let the images change automatically (every 10 seconds).


EgoAI accompanies Claire throughout her itinerary in Turin

From Narratives to Research Tasks


We connect the narratives in our stories to research tasks in egocentric vision. For each of the use case, we show the corresponding research tasks, along with the specific part of the story where the tasks are occurring. For example, for Ego-Home we have Section 4.2 on 3D Scene Understanding corresponding to task 1, 2, 3, 4, 7, 8, and 9 in the story.

Research Tasks and Capabilities

We explore various egocentric vision tasks, like:

Localisation     3D Scene Understanding
Recognition     Anticipation
Gaze Understanding and Prediction     Social Behaviour Understanding
Full-body Pose Estimation     Hand and Hand-Object Interactions
Person Identification     Summarisation
Dialogue     Privacy

For these topics, instead of attempting to cover the entire spectrum of progress within the field, our approach prioritizes seminal works that laid the foundation for each task or significantly influenced its trajectory. We also highlight state-of-the-art methods currently achieving optimal performance and mention specific datasets tailored to advance research in these areas. Each subsection concludes with a brief reflection on the gap between the current state-of-the-art and the envisioned future.

In this way, we review 464 papers in egocentric vision!


Please consider citing if you make use of the work:

@article{plizzari2024outlook,
title={An Outlook into the Future of Egocentric Vision},
author={Chiara Plizzari and Gabriele Goletto and Antonino Furnari and Siddhant Bansal and Francesco Ragusa and Giovanni Maria Farinella and Dima Damen and Tatiana Tommasi},
year={2024},
journal={International Journal of Computer Vision}}

Acknowledgements

We thank Fritz J. Rustan, Illustrator in 99designs, for the fruitful and close collaboration to produce the EgoAI illustrations (Fig 1 - Fig 5).
We thank Mirco Planamente for early discussions on this survey and initial collection of relevant papers.
Research at the University of Bristol is supported by EPSRC Program Grant Visual AI EP/T028572/1. D. Damen is supported by EPSRC Fellowship UMPIRE EP/T004991/1.
Research at University of Catania has been supported by the project Future Artificial Intelligence Research (FAIR) – PNRR MUR Cod. PE0000013 - CUP: E63C22001940006.
T. Tommasi is supported by the project FAIR - Future Artificial Intelligence Research and received funding from the European Union Next-GenerationEU (PIANO NAZIONALE DI RIPRESA E RESILIENZA (PNRR) – MISSIONE 4 COMPONENTE 2, INVESTIMENTO 1.3 – D.D. 1555 11/10/2022, PE00000013). C. Plizzari and G. Goletto acknowledge travel support from ELISE (GA no 951847). G. Goletto is supported by PON “Ricerca e Innovazione” 2014-2020 – DM 1061/2021 funds.