Video AI Symposium

30 Sep - 1 Oct 2023

Google DeepMind Offices, London


We are building on a great tradition of video understanding symposiums/summits previously held in Europe (2019, 2022) and the US (2019, 2017) - inviting you to come together for a much-needed discussion on next steps for video understanding. Despite tremendous efforts and an increase in the number and scale of video datasets, video understanding remains a bottleneck for research.

We thus invite you for a 1.5-days invitation-only closed event, to exchange ideas, form research directions, establish tight collaborations and recommend new research directions.

The Video AI Symposium will be held in Central London, at the Google DeepMind offices on Saturday 30 Sep and Sunday 1 Oct 2023. The event will bring together 50 researchers for an opportunity to exchange ideas and connect over a mutual interest in video understanding.

Sponsored by: Google DeepMind, Google Research and Meta AI

Confirmed Attendees

Rahul Sukthankar

Google Research

Cordelia Schmid

INRIA, Google Research

Andrew Zisserman

University of Oxford, Google DeepMind

Jitendra Malik

UC Berkeley and Meta AI

William Freeman

MIT and Google Research

Cees Snoek

University of Amsterdam

Alexei Efros

UC Berkeley

Ivan Laptev


Michal Irani

Weizmann Institute

Andrea Vedaldi

University of Oxford and Meta

Kristen Grauman

Meta AI and UT Austin

Joao Carreira

Google DeepMind

Dima Damen

University of Bristol and Google DeepMind

Joseph Tighe

Meta AI

Juan Carlos Niebles

Salesforce - Stanford University

Efstratios Gavves

University of Amsterdam

Juergen Gall

University of Bonn

Carl Vondrick

Columbia University

Hilde Kuehne

University of Bonn and MIT-IBM Watson AI Lab

Thomas Kipf

Google DeepMind

Angela Yao

Singapore University

Limin Wang

Nanjing University

David Fouhey

New York University

Michael S. Ryoo

Stony Brook University and Google DeepMind

Andrew Owens

University of Michigan

Ishan Misra

Meta AI

Gul Varol

École des Ponts ParisTech

Carl Doersch

Google DeepMind

Rohit Gridhar

Meta AI

Adam Harley

Stanford University

Arsha Nagrani

Google Research

Jean-Baptiste Alayrac

Google DeepMind

Weidi Xie

Shanghai Jiao Tong University

Laura Sevilla

University of Edinburgh

Tengda Han

University of Oxford

Toby Perrett

University of Bristol

Yuki Asano

University of Amsterdam

Hazel Doughty

University of Leiden

Ankush Gupta

Google DeepMind

Viorica Patraucean

Google DeepMind

Olivier Henaff

Google DeepMind

Dilara Gokay

Google DeepMind

Adria Recasens

Google DeepMind

Yi Yang

Google DeepMind

Ross Goroshin

Google DeepMind

Karel Lenc

Google DeepMind

Skanda Koppula

Google DeepMind

Mateusz Malinowski

Google DeepMind

Chuhan Zhang

Google DeepMind

Daniel Zoran

Google DeepMind

Yusuf Aytar

Google DeepMind

Pauline Luc

Google DeepMind


Local Organisation and Host



Saturday 30 Sep
08:45-09:30 Breakfast                                    
09:30-09:45 Introduction                                    
09:45-10:45 Rohit Girdhar Learning visual representations with minimal supervision                                    
  Limin Wang Towards building video foundation models                                    
  Christoph Feichtenhofer Self-Supervised Video Understanding                                    
10:45-11:15 Open Discussion How to learn from Raw Videos?                                    
11:15-11:45 Coffee Break                                    
11:45-12:45 Kristen Grauman See What I See and Hear What I Hear: First-Person Perception and the Future of AR and Robotics                                
  Michael Ryoo Video Representations for Robot Learning                                    
  Arsha Nagrani How can LLMs help with video understanding?                                    
12:45-13:30 Lunch                                    
13:30-14:10 Bill Freeman Watching videos out of the corner of your eye                                    
  Philipp Krähenbühl Towards faster video models                                    
  Stratis Gavves Causal Computer Vision towards Embodied General Intelligence                              
14:10-14:40 Open Discussion One dataset to solve it all - from tiktok to robotics                                    
14:40-15:00 Coffee Break                                    
15:00-16:00 Cordelia Schmid Dense video captioning and beyond                                    
  Cees Snoek Towards Human-Aligned Video-AI                                    
  Dima Damen Should we still seek fine-grained perception in video?                                    
16:00-16:20 Coffee Break                                    
16:20-17:00 Carl Vondrick System 2 and Video                                    
17:00-17:45 Open Discussion The crisis of downstream tasks... Are current benchmarks a good measure of research progress?                                
19:00-22:00 Dinner                                    
Sunday 1 October
08:45-09:30 Breakfast                                    
09:30-10:30 Joseph Tighe A new benchmark for an embodied AI assistant                                    
  Adam Harley Tracking Any Pixel in a Video                                    
  Carl Doersch Tracking Any Point                                    
10:30-10:45 Coffee Break                                    
10:45-11:45 Gul Varol Beyond Text Queries for Search: Composed Video Retrieval                                    
  Andrew Owens Multimodal Learning from the Bottom Up                                    
  Jitendra Malik Unsolved problems in video understanding                                    
11:45-12:00 Coffee Break                                    
12:00-12:40 Angela Yao VideoQA in the Time of Large Language Models                                    
  Laura Sevilla Lara Video Understanding Using Less Compute and Less Training Data                                    
12:40-13:15 Open Discussion Camera view vs world view - should video be studied in 3D                                    
13:15-14:30 Lunch and Prep to Leaving to Train                                    
14:30- 10 mins walk to St Pancras Station to take Eurostar Train                                    
16:31-19:47 Eurostar Train to Paris Gare du Nord (Dep-Arr)