Iterative Contrast-Classify for Semi-supervised Temporal Action Segmentation

Authors: Dipika Singhania, Rahul Rahaman, Angela Yao2262-2270

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Temporal action segmentation classifies the action of each frame in (long) video sequences. Due to the high cost of framewise labeling, we propose the first semi-supervised method for temporal action segmentation. Our method hinges on unsupervised representation learning, which, for temporal action segmentation, poses unique challenges... We develop an Iterative Contrast-Classify (ICC) semi-supervised learning scheme. With more labelled data, ICC progressively improves in performance; ICC semi-supervised learning, with 40% labelled videos, performs similar to fully-supervised counterparts. Our ICC improves Mo F by {+1.8, +5.6, +2.5}% on Breakfast, 50Salads and GTEA respectively for 100% labelled videos.
Researcher Affiliation Academia Dipika Singhania , Rahul Rahaman , Angela Yao National University of Singapore dipika16@comp.nus.edu.sg, rahul.rahaman@u.nus.edu, ayao@comp.nus.edu.sg
Pseudocode No The paper does not contain any explicit pseudocode blocks or sections labeled 'Algorithm'.
Open Source Code No The provided text does not contain any statement about open-sourcing the code for the described methodology or a link to a code repository.
Open Datasets Yes We test on Breakfast Actions (Kuehne, Arslan, and Serre 2014) (1.7k videos, 10 complex activities, 48 actions), 50Salads (Stein and Mc Kenna 2013) (50 videos, 19 actions) and GTEA (Fathi, Ren, and Rehg 2011) (28 videos, 11 actions).
Dataset Splits No The paper mentions 'specified train-test splits' and refers to 'labelled dataset DL' and 'unlabelled videos DU' within the training process. However, it does not explicitly specify a distinct validation set with percentages or sample counts, nor does it describe a methodology for creating one for hyperparameter tuning.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific libraries with their versions).
Experiment Setup Yes We sample frames from each video with K = {20, 60, 20} partitions, ε 1 3K for sampling, and temporal proximity δ = {0.03, 0.5, 0.5} for Breakfast, 50Salads, and GTEA respectively. The contrastive temperature τ in Eqs. (3) and (4) is set to 0.1.