In CVPR2018, we presented a method for assessing skill of performance from video, applicable to a variety of tasks, ranging from surgery to drawing
and rolling pizza dough. We formulate the problem as pairwise (who's better) and overall (who's best)ranking of video collections,
using supervised deep ranking. We propose a novel loss function that learns discriminative features when a pair of videos exhibit variance in skill,
and learns shared features when a pair of videos exhibit comparable skill levels. Results demonstrate our method is applicable across tasks, with the
percentage of correctly ordered pairs of videos ranging from 70% to 83% for four datasets. We demonstrate the robustness of our approach via
sensitivity analysis of its parameters.
In Dec 2018, we present a new model to determine relative skill from long videos, through learnable temporal attention modules. We propose to train rank-specific temporal attention modules, learned with only video-level supervision, using a novel rank-aware loss function. In addition to attending to task-relevant video parts, our proposed loss jointly trains two attention modules to separately attend to video parts which are indicative of higher (pros) and lower (cons) skills.
Hazel Doughty, Walterio Mayol-Cuevas, Dima Damen (2018). The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos. Arxiv. arxiv
Hazel Doughty, Dima Damen and Walterio Mayol-Cuevas (2018). Who's Better, Who's Best: Skill Determination in Video using Deep Ranking. CVPR. PDFarXiv