| Tweets by dimadamen |
A Darkhalil, R Guerrier, A W Harley, D Damen (2025). EgoPoints: Advancing Point Tracking for Egocentric Videos. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). ArXiv | Webpage | Code and Benchmark |
T Soucek, P Gatti, M Wray, I Laptev, D Damen, J Sivic (2024). ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions. ArXiv ArXiv | Website | Code | Dataset [Coming Soon] |
T Perrett, T Han, D Damen, A Zisserman (2024). It's Just Another Day: Unique Video Captioning by Discriminitave Prompting. ACCV (Oral Presentation) ArXiv | Website | Code and Benchmark |
G Goletto, T Nagarajan, G Averta, D Damen (2024). AMEGO: Active Memory from long EGOcentric videos. ECCV ArXiv | Website | AMB Benchmark | Code |
S Bansal, M Wray, D Damen (2024). HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision. ArXiv | Website | HOI-QA Dataset | Models and Code |
TIM: A Time Interval Machine for Audio-Visual Action Recognition. Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen (2024). IEEE/CVF Computer Vision and Pattern Recognition (CVPR). Webpage | Code and Models | ArXiv | < a href="https://openaccess.thecvf.com/content/CVPR2024/papers/Chalk_TIM_A_Time_Interval_Machine_for_Audio-Visual_Action_Recognition_CVPR_2024_paper.pdf">CVF PDF |
ESpatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind. Chiara Plizzari, Shubham Goel, Toby Perrett, Jacob Chalk, Angjoo Kanazawa, Dima Damen (2025). 3DV Webpage | ArXiv | Video |
Every Shot Counts: Using Exemplars for Repetition Counting in Videos. Saptarshi Sinha, Alexandros Stergiou, Dima Damen (2024). Asian Conference on Computer Vision (ACCV). Webpage | Code | ArXiv We propose an exemplar-based approach that discovers visual correspondence of video exemplars across repetitions within target videos. Our proposed Every Shot Counts (ESCounts) model is an attention-based encoder-decoder that encodes videos of varying lengths alongside exemplars from the same and different videos. |
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos. Tomas Soucek, Dima Damen, Michael Wray, Ivan Laptev, Josef Sivic (2024). IEEE/CVF Computer Vision and Pattern Recognition (CVPR). Webpage | Code | ArXiv | CVF PDF |
Get a Grip: Reconstructing Hand-Object Stable Grasps in Egocentric Videos. Zhifan Zhu and Dima Damen (2024). ArXiv. Webpage | EPIC-Grasps Dataset and Code | ArXiv (v2) |
Rank2Reward: Learning Shaped Reward Functions from Passive Video. Daniel Yang, Davin Tjia, Jacob Berg, Dima Damen, Pulkit Agrawal and Abhishek Gupta (2024). IEEE International Conference on Robotics and Automation (ICRA).Webpage | ArXiv |
Ego-Exo4D : Understanding Skilled Human Activity from First- and Third-Person Perspectives. K Grauman et al. (2024). IEEE/CVF Computer Vision and Pattern Recognition (CVPR). ArXiv, Webpage, PDF | CVF PDF |
An Outlook into the Future of Egocentric Vision. C Plizzari*, G Goletto*, A Furnari*, S Bansal*, F Ragusa*, GM Farinella, D Damen, T Tommasi. (2024). International Journal of Computer Vision (IJCV). PDF | Open Review Preprint | ArXiv |
Learning Temporal Sentence Grounding From Narrated EgoVideos. K Flanagan, D Damen, M Wray (2023). British Machine Vision Conference (BMVC). ArXiv Camera Ready | Project Webpage | Code and Models |
EPIC Fields: Marrying 3D Geometry and Video Understanding. V Tschernezki*, A Darkhalil*, Z Zhu*, D Fouhey, I Laina, D Larlus, D Damen, A Vedaldi (2023). Neural Information Processing Systems (NeurIPS) Preprint, Webpage |
What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations. C Plizzari, T Perrett, B Caputo, D Damen. ICCV 2023 Preprint | Webpage | Dataset | Code |
CVF PDF | CVF Supp | ArXiv | Webpage | Benchmarks, Code and Models Use Your Head: Improving Long-Tail Video Recognition. T Perrett, S Sinha, T Perrett, M Mirmehdi, D Damen. CVPR 2023. |
CVF PDF | CVF Supp | ArXiv | Webpage | Code The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction. A Stergiou, D Damen. CVPR 2023. |
EPIC-SOUNDS: A Large-Scale Dataset of Actions that Sound. J Huh*, J Chalk*, E Kazakos, D Damen, A Zisserman. Journal Extended Version (Under Review) (2024) ArXiv, Webpage EPIC-SOUNDS: A Large-Scale Dataset of Actions that Sound. J Huh*, J Chalk*, E Kazakos, D Damen, A Zisserman. ICASSP 2023. ArXiv, Webpage |
Play It Back: Iterative Attention for Audio Recognition. A Stergiou, D Damen. ICASSP 2023. ArXiv, Webpage |
Trailer | Reveal @EPIC2022 | Download EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations. A Darkhalil, D Shan, B Zhu, J Ma, A Kar, R Higgins, S Fidler, D Fouhey, D Damen. NeurIPS 2022. PDF, Webpage |
ConTra: Context Transformer for Cross-Modal Retrieval. A Fragomeni, M Wray, D Damen. ACCV (2022) Oral. ArXiv | PDF Preprint | Project Webpage | Code
Egocentric Video-Language Pretraining. KQ Lin, AJ Wang, M Soldan, M Wray, R Yan, EZ Xu, D Gao, R Tu, W Zhao, W Kong, C Cai, H Wang, D Damen, B Ghanem, W Liu, MZ Shou. NeurIPS (2022). ArXiv | PDF Preprint | Project Webpage | Code
UnweaveNet: Unweaving Activity Stories. W Price, C Vondrick, D Damen. CVPR (2022). ArXiv Paper | Project Webpage | Annotations
Around the World in 3,000 Hours of Egocentric Video. K Grauman (+83 Authors) et al. CVPR (2022). ArXiv
With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition. E Kazakos, J Huh, A Nagrani, A Zisserman, D Damen. BMVC (2021). ArXiv Paper | Project Webpage | Code, features and models
Trailer | Video Demonstration | Webinar | Download Rescaling Egocentric Vision. D Damen, H Doughty, G Farinella, A Furnari, E Kazakos, J Ma, D Moltisanti, J Munro, T Perrett, W Price, M Wray. IJCV. IJCV paper, ArXiv, Webpage The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines. D Damen, H Doughty, GM Farinella, S Fidler, A Furnari, E Kazakos, D Moltisanti, J Munro, T Perrett, W Price, M Wray. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(11) pp 4125-4141 (2021). IEEE, Arxiv Preprint |
Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval. J Munro, M Wray, D Larlus, G Csurka, D Damen. ArXiv (2021). ArXiv Paper
On Semantic Similarity in Video Retrieval. M Wray, H Doughty, D Damen. CVPR (2021). CVF PDF | ArXiv Preprint | Project Webpage | Video
Temporal-Relational CrossTransformers for Few-Shot Action Recognition. T Perrett, A Masullo, T Burghardt, M Mirmehdi, D Damen. CVPR (2021). CVF PDF | ArXiv Preprint | Code and Model | Project Webpage
Slow-Fast Auditory Streams For Audio Recognition. E Kazakos, A Nagrani, A Zisserman, D Damen. ICASSP (2021). ArXiv Preprint | IEEE PDF | Code and Models | Project Webpage [Outstanding Paper]
Interactive Dashboard | Teaser Video | Code
Play Fair: Frame Attributions in Video Models. W Price, D Damen. ACCV (2020). ArXiv Preprint | Project Details | CVF | CVF PDF
MetaLearning with Context-Agnostic Initialisation. T Perrett, A Masullo, T Burghard, M Mirmehdi, D Damen. ACCV (2020). ArXiv Preprint | CVF | CVF PDF | Project Details
Action Modifiers: Learning from Adverbs in Instructional Videos. H Doughty, I Laptev, W Mayol-Cuevas, D Damen. CVPR (2020). ArXiv Preprint, CVF PDF, Project Details
Video, Oral Presentation Video
Multi-modal Domain Adaptation for Fine-grained Action Recognition. J Munro, Dima Damen. CVPR (2020). ArXiv Preprint, CVF PDF, Project Details, Code
Retro-Actions: Learning 'Close' by Time-Reversing 'Open' Videos. W Price, Dima Damen. ICCV (2019). ArXiv Preprint, Project Details
Fine-Grained Action Retrieval through Multiple Parts-of-Speech Embeddings. Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen. ICCV (2019). CVF PDF, ArXiv Preprint, Project Details
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition. Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen. ICCV (2019). Project Details, CVF PDF, Arxiv Preprint
Learning Visual Actions Using Multiple Verb-Only Labels. M Wray, D Damen. BMVC (2019). ArXiv Preprint, Project Details
DDLSTM: Dual-Domain LSTM for Cross-Dataset Action Recognition. T Perrett and D Damen. CVPR (2019). pdf preprint, Arxiv Project Details
The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos. H Doughty, W Mayol-Cuevas, D Damen. CVPR (2019). pdf preprint, Arxiv, Project Details
Action Recognition from Single Timestamp Supervision in Untrimmed Videos. D Moltisant, S Fidler and D Damen. CVPR (2019). pdf preprint, Project Details
(2021) B Sullivan, C Ludwig, D Damen, W Mayol-Cuevas, I Gilchrist. Look-Ahead Fixations During Visuomotor Behavior: Evidence from Assembling a Camping Tent. Journal of Vision 21(3):13. PDF
EPIC-Tent: An Egocentric Video Dataset for Camping Tent Assembly. Y Jang, B Sullivan, C Ludwig, I.D. Gilchrist, D Damen and W Mayol-Cuevas. ICCV Workshops (2019). pdf, Project Details, Dataset, Annotations
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset. D Damen, H Doughty, G Farinella, S Fidler, A Furnari, E Kazakos, D Moltisanti, J Munro, T Perrett, W Price, M Wray. ECCV (2018). Webpage | Dataset | arxiv An Evaluation of Action Recognition Models on EPIC-Kitchens. W Price, D Damen. Arxiv (2019) Arxiv | Github | PDF The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines. D Damen, H Doughty, GM Farinella, S Fidler, A Furnari, E Kazakos, D Moltisanti, J Munro, T Perrett, W Price, M Wray. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020). Arxiv Preprint |
Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination. H Doughty, D Damen, W Mayol-Cuevas. CVPR (2018). PDF | arxiv | Dataset
Weakly-Supervised Completion Moment Detection using Temporal Attention. F Heidarivincheh, M Mirmehdi, D Damen. ICCV Workshop on Human Behaviour Understanding. Arxiv | CVF PDF, Oct 2019.
Action Completion: A Temporal Model for Moment Detection. F Heidarivincheh, M Mirmehdi, D Damen. British Machine Vision Conference (BMVC), Sep 2018. Arxiv PDF | Dataset
Beyond Action Recognition: Action Completion in RGB-D Data. F Heidarivincheh, M Mirmehdi, D Damen. British Machine Vision Conference (BMVC), Sep 2016. pdf | abstract | Dataset
Human Routine Change Detection using Bayesian Modelling. Y Xu, D Damen. ICPR (2018). PDF
Unsupervised Long-Term Routine Modelling using Dynamic Bayesian Networks. Y Xu, D Bull, D Damen. DICTA (2017). PDF
Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video. D Moltisanti, M Wray, W Mayol-Cuevas, D Damen. International Conference on Computer Vision (ICCV), 2017. pdf (camera ready) | arxiv
SEMBED: Semantic Embedding of Egocentric Action Videos. M Wray, D Moltisanti, W Mayol-Cuevas, D Damen. Egocentric Interaction, Perception and Computing (EPIC), European Conference on Computer Vision Workshops (ECCVW), Oct 2016. pdf | Dataset
Automated capture and delivery of assistive task guidance with an eyewear computer: The GlaciAR system. T Leelasawassuk, D Damen, W Mayol-Cuevas. Augmented Human, Mar 2017 pdf
You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video. D Damen, T Leelasawassuk, O Haines, A Calway, W Mayol-Cuevas. British Machine Vision Conference (BMVC), Sep 2014. PDF | Abstract | Dataset
Multi-user egocentric Online System for Unsupervised Assistance on Object Usage. D Damen, O Haines, T Leelasawassuk, A Calway, W Mayol-Cuevas. ECCV Workshop on Assistive Computer Vision and Robotics (ACVR), Sep 2014. PDF Preprint
Estimating Visual Attention from a Head Mounted IMU. T Leelasawassuk, D Damen, W Mayol-Cuevas. International Symposium on Wearable Computers (ISWC), Sep 2015. PDF
Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling. M Camplani, S Hannuna, M Mirmehdi, D Damen, L Tao, T Burghardt and A Paiment. British Machine Vision Conference (BMVC), Sep 2015. PDF.
Real-time Learning and Detection of 3D Texture-minimal Objects: A Scalable approach. D Damen, P Bunnun, A Calway, W Mayol-Cuevas. British Machine Vision Conference (BMVC), Sep 2012. PDF | Abstract | Code | Video | Dataset.
Efficient Texture-less Object Detection for Augmented Reality Guidance. T Hodan, D Damen, W Mayol-Cuevas, J Matas. IEEE Int. Symposium on Mixed and Augmented Reality (ISMAR) Workshop on Visual Recognition and Retrieval for Mixed and Augmented Reality, Sep 2015.
Cognitive Learning, Monitoring and Assistance of Industrial Workflows Using Egocentric Sensor Networks. G Bleser, D Damen, A Behera, et al. PLOSONE, June 2015 PDF.
Egocentric Real-time Workspace Monitoring using an RGB-D Camera. D Damen, A Gee, W Mayol-Cuevas, A Calway. Intelligent Robotics and Systems (IROS), Oct 2012. PDF | Video.
Online quality assessment of human movement from skeleton data. A Paiment, L Tao, S Hannuna, M Camplani, D Damen and M Mirmehdi. British Machine Vision Conference (BMVC), Sep 2014. PDF | Dataset.
Explaining Activities as Consistent Groups of Events - A Bayesian Framework using Attribute Multiset Grammars. D Damen and D Hogg International Journal of Computer Vision (IJCV), 2012. PDF.
Recognizing Linked Events: Searching the Space of Feasible Explanations. D Damen and D Hogg. Computer Vision and Pattern Recognition (CVPR), Miami, Florida, June 2009. PDF | Poster
Detecting Carried Objects from Sequences of Walking Pedestrians. D Damen and D Hogg. Pattern Analysis and Machine Intelligence (PAMI), 2012. PDF.
Detecting Carried Objects in Short Video Sequences. D Damen and D Hogg. European Conference on Computer Vision (ECCV), Marseille, France, Oct 2008 PDF | Poster