Are all Frames Equal? Active Sparse Labeling for Video Action Detection

Authors: Aayush Rana, Yogesh Rawat

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed approach on two different action detection benchmark datasets, UCF101-24 and J-HMDB-21, and observed that active sparse labeling can be very effective in saving annotation costs.
Researcher Affiliation Academia Aayush J Rana Yogesh S Rawat aayushjr@knights.ucf.edu yogesh@crcv.ucf.edu Center for Research in Computer Vision (CRCV) University of Central Florida
Pseudocode Yes The entire selection algorithm is provided Appendix.
Open Source Code Yes Project details available at https://sites.google.com/view/activesparselabeling/home
Open Datasets Yes We evaluate our approach on three different datasets, UCF-101 [13], J-HMDB [14] and You Tube-VOS [77].
Dataset Splits No The paper states the datasets used (UCF-101, J-HMDB, YouTube-VOS) and percentages of annotated frames used for training (e.g., '1% of labelled frames'), but it does not explicitly provide details about the train, validation, and test dataset splits needed for reproduction.
Hardware Specification No The paper's checklist states 'Detailed in supplementary' for hardware specifications, but these details are not provided in the main text of the paper.
Software Dependencies No The paper states 'We implement our method in Py Torch [81]', but it does not specify the version number of PyTorch or any other software dependencies.
Experiment Setup Yes We use Adam optimizer [83] with a batch size of 8 and train for 22K iterations in each active learning cycle (details in appendix G.3). We use dropout for generating uncertainty similar to [73] by enabling it during inference. For You Tube-VOS task, we use two existing methods [77, 84]. We use τ = 0.9 for non-active suppression and σ = 1.3 for Eq. 2 and Eq. 4, which were empirically determined (details in appendix C).