Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Stable Mean Teacher for Semi-supervised Video Action Detection
Authors: Akash Kumar, Sirshapan Mitra, Yogesh Singh Rawat
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on four different spatio-temporal detection benchmarks, UCF101-24, JHMDB21, AVA, and Youtube-VOS. Our approach outperforms the supervised baselines for action detection by an average margin of 23.5% on UCF101-24, 16% on JHMDB21, and, 3.3% on AVA. We perform a comprehensive evaluation on three different action detection benchmarks. Our study demonstrates significant improvement over supervised baselines, consistently outperforming the state-of-the-art approach for action detection (Figure 1). We also demonstrate the generalization capability of our approach to video object segmentation. |
| Researcher Affiliation | Academia | Akash Kumar, Sirshapan Mitra, Yogesh Singh Rawat Center for Research in Computer Vision, University of Central Florida EMAIL |
| Pseudocode | No | The paper describes the methodology in prose and mathematical equations, but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/AKASH2907/stable-mean-teacher Project Page https://akash2907.github.io/smt webpage |
| Open Datasets | Yes | Datasets: We use four benchmark datasets to perform our experiments; UCF101-24 (2012), JHMDB21 (2013), and AVA v2.2 (AVA)(2018) for action detection, and You Tube VOS (2018c) to show generalization on video segmentation (VOS). |
| Dataset Splits | Yes | Labeled and unlabeled setup: The labeled and unlabeled subset for UCF101-24 and Youtube-VOS is divided in the ratio of 10:90 and for JHMDB21 it s 20:80. For the AVA dataset, we use 50% of the dataset for semi-sup setup. We utilize a 10:40 split between labeled to unlabeled ratio. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Implementation details We train the model for 50 epochs with a batch size of 8 where the number of samples from both labeled and unlabeled subsets are the same. The value of β for EMA parameters update is set to 0.99 which follows prior works (2022; 2021). The value of λ for the unsupervised loss is set to 0.1 determined empirically. |