Video AI Symposium

30 Sep - 1 Oct 2023

Google DeepMind Offices, London

About

We are building on a great tradition of video understanding symposiums/summits previously held in Europe (2019, 2022) and the US (2019, 2017) - inviting you to come together for a much-needed discussion on next steps for video understanding. Despite tremendous efforts and an increase in the number and scale of video datasets, video understanding remains a bottleneck for research.

We thus invite you for a 1.5-days invitation-only closed event, to exchange ideas, form research directions, establish tight collaborations and recommend new research directions.

The Video AI Symposium will be held in Central London, at the Google DeepMind offices on Saturday 30 Sep and Sunday 1 Oct 2023. The event will bring together 50 researchers for an opportunity to exchange ideas and connect over a mutual interest in video understanding.

Sponsored by: Google DeepMind, Google Research and Meta AI

Confirmed Attendees

Rahul Sukthankar

Google Research

Cordelia Schmid

INRIA, Google Research

Andrew Zisserman

University of Oxford, Google DeepMind

Jitendra Malik

UC Berkeley and Meta AI

William Freeman

MIT and Google Research

Cees Snoek

University of Amsterdam

Lorenzo Torresani

Meta AI

Alexei Efros

UC Berkeley

Ivan Laptev

INRIA Paris

Michal Irani

Weizmann Institute

Andrea Vedaldi

University of Oxford and Meta

Antonio Torralba

MIT

Kristen Grauman

Meta AI and UT Austin

Joao Carreira

Google DeepMind

Dima Damen

University of Bristol and Google DeepMind

Christoph Feichtenhofer

Meta AI

Joseph Tighe

Meta AI

Juan Carlos Niebles

Salesforce - Stanford University

Efstratios Gavves

University of Amsterdam

Juergen Gall

University of Bonn

Bernard Ghanem

KAUST

Carl Vondrick

Columbia University

Hilde Kuehne

University of Bonn and MIT-IBM Watson AI Lab

Thomas Kipf

Google DeepMind

Katerina Fragkiadaki

CMU

Angela Yao

Singapore University

Limin Wang

Nanjing University

David Fouhey

New York University

Michael S. Ryoo

Stony Brook University and Google DeepMind

Andrew Owens

University of Michigan

Ishan Misra

Meta AI

Gul Varol

École des Ponts ParisTech

Carl Doersch

Google DeepMind

Rohit Gridhar

Meta AI

Adam Harley

Stanford University

Arsha Nagrani

Google Research

Jean-Baptiste Alayrac

Google DeepMind

Weidi Xie

Shanghai Jiao Tong University

Laura Sevilla

University of Edinburgh

Tengda Han

University of Oxford

Toby Perrett

University of Bristol

Yuki Asano

University of Amsterdam

Hazel Doughty

University of Leiden

Ankush Gupta

Google DeepMind

Viorica Patraucean

Google DeepMind

Olivier Henaff

Google DeepMind

Dilara Gokay

Google DeepMind

Adria Recasens

Google DeepMind

Yi Yang

Google DeepMind

Ross Goroshin

Google DeepMind

Karel Lenc

Google DeepMind

Skanda Koppula

Google DeepMind

Mateusz Malinowski

Google DeepMind

Chuhan Zhang

Google DeepMind

Daniel Zoran

Google DeepMind

Yusuf Aytar

Google DeepMind

Pauline Luc

Google DeepMind

Organisers

Joao Carreira - Google DeepMind
Dima Damen - University of Bristol and Google DeepMind
Christoph Feichtenhofer - Meta AI
Joseph Tighe - Meta AI

Local Organisation and Host

Junlin Zhang - Google DeepMind

Advisors

Andrew Zisserman - University of Oxford and Google DeepMind
Cordelia Schmid - INRIA, Google Research
Ivan Laptev - INRIA Paris
Jitendra Malik - UC Berkeley and Meta AI
Cees Snoek - University of Amsterdam

Location

Google DeepMind R7, 14-18 Handyside Street, King's Cross London, N1C 4DN

Program

Saturday 30 Sep

Learning visual representations with minimal supervision

Towards building video foundation models

Christoph Feichtenhofer

Self-Supervised Video Understanding

Open Discussion

How to learn from Raw Videos?

Kristen Grauman

See What I See and Hear What I Hear: First-Person Perception and the Future of AR and Robotics

Video Representations for Robot Learning

How can LLMs help with video understanding?

Watching videos out of the corner of your eye

~~Philipp Krähenbühl~~

~~Towards faster video models~~

Causal Computer Vision towards Embodied General Intelligence

Open Discussion

One dataset to solve it all - from tiktok to robotics

Cordelia Schmid

Dense video captioning and beyond

Towards Human-Aligned Video-AI

Should we still seek fine-grained perception in video?

System 2 and Video

Open Discussion

The crisis of downstream tasks... Are current benchmarks a good measure of research progress?

Sunday 1 October

A new benchmark for an embodied AI assistant

Tracking Any Pixel in a Video

Tracking Any Point

Beyond Text Queries for Search: Composed Video Retrieval

Multimodal Learning from the Bottom Up

Unsolved problems in video understanding

VideoQA in the Time of Large Language Models

Laura Sevilla Lara

Video Understanding Using Less Compute and Less Training Data

Open Discussion

Camera view vs world view - should video be studied in 3D

Lunch and Prep to Leaving to Train

10 mins walk to St Pancras Station to take Eurostar Train

Eurostar Train to Paris Gare du Nord (Dep-Arr)