Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stable Mean Teacher for Semi-supervised Video Action Detection

Authors: Akash Kumar, Sirshapan Mitra, Yogesh Singh Rawat

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on four different spatio-temporal detection benchmarks, UCF101-24, JHMDB21, AVA, and Youtube-VOS. Our approach outperforms the supervised baselines for action detection by an average margin of 23.5% on UCF101-24, 16% on JHMDB21, and, 3.3% on AVA. We perform a comprehensive evaluation on three different action detection benchmarks. Our study demonstrates significant improvement over supervised baselines, consistently outperforming the state-of-the-art approach for action detection (Figure 1). We also demonstrate the generalization capability of our approach to video object segmentation.
Researcher Affiliation Academia Akash Kumar, Sirshapan Mitra, Yogesh Singh Rawat Center for Research in Computer Vision, University of Central Florida EMAIL
Pseudocode No The paper describes the methodology in prose and mathematical equations, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/AKASH2907/stable-mean-teacher Project Page https://akash2907.github.io/smt webpage
Open Datasets Yes Datasets: We use four benchmark datasets to perform our experiments; UCF101-24 (2012), JHMDB21 (2013), and AVA v2.2 (AVA)(2018) for action detection, and You Tube VOS (2018c) to show generalization on video segmentation (VOS).
Dataset Splits Yes Labeled and unlabeled setup: The labeled and unlabeled subset for UCF101-24 and Youtube-VOS is divided in the ratio of 10:90 and for JHMDB21 it s 20:80. For the AVA dataset, we use 50% of the dataset for semi-sup setup. We utilize a 10:40 split between labeled to unlabeled ratio.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Implementation details We train the model for 50 epochs with a batch size of 8 where the number of samples from both labeled and unlabeled subsets are the same. The value of β for EMA parameters update is set to 0.99 which follows prior works (2022; 2021). The value of λ for the unsupervised loss is set to 0.1 determined empirically.