Who's Better? Who's Best?: Pairwise Deep Ranking for Skill Determination

March 2018

Hazel Doughty, Walterio Mayol-Cuevas, Dima Damen

In CVPR2018, we presented a method for assessing skill of performance from video, applicable to a variety of tasks, ranging from surgery to drawing and rolling pizza dough. We formulate the problem as pairwise (who's better) and overall (who's best)ranking of video collections, using supervised deep ranking. We propose a novel loss function that learns discriminative features when a pair of videos exhibit variance in skill, and learns shared features when a pair of videos exhibit comparable skill levels. Results demonstrate our method is applicable across tasks, with the percentage of correctly ordered pairs of videos ranging from 70% to 83% for four datasets. We demonstrate the robustness of our approach via sensitivity analysis of its parameters.

Skill Determination Overview

In Dec 2018, we present a new model to determine relative skill from long videos, through learnable temporal attention modules. We propose to train rank-specific temporal attention modules, learned with only video-level supervision, using a novel rank-aware loss function. In addition to attending to task-relevant video parts, our proposed loss jointly trains two attention modules to separately attend to video parts which are indicative of higher (pros) and lower (cons) skills.

Skill Determination Overview


Hazel Doughty, Walterio Mayol-Cuevas, Dima Damen (2018). The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos. Arxiv. arxiv

Hazel Doughty, Dima Damen and Walterio Mayol-Cuevas (2018). Who's Better, Who's Best: Skill Determination in Video using Deep Ranking. CVPR. PDF arXiv


The EPIC-Skills 2018 Dataset is available here. It contains the videos for the Drawing and Chopstick Using tasks alongside the annotations for all tasks.

The Surgery videos are taken from the JIGSAWS dataset and the Dough Rolling videos are taken from the CMU-MMAC pizza making activity.