Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Stable Mean Teacher for Semi-supervised Video Action Detection
Authors: Akash Kumar, Sirshapan Mitra, Yogesh Singh Rawat
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on four different spatio-temporal detection benchmarks, UCF101-24, JHMDB21, AVA, and Youtube-VOS. Our approach outperforms the supervised baselines for action detection by an average margin of 23.5% on UCF101-24, 16% on JHMDB21, and, 3.3% on AVA. We perform a comprehensive evaluation on three different action detection benchmarks. Our study demonstrates significant improvement over supervised baselines, consistently outperforming the state-of-the-art approach for action detection (Figure 1). We also demonstrate the generalization capability of our approach to video object segmentation. |
| Researcher Affiliation | Academia | Akash Kumar, Sirshapan Mitra, Yogesh Singh Rawat Center for Research in Computer Vision, University of Central Florida EMAIL |
| Pseudocode | No | The paper describes the methodology in prose and mathematical equations, but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/AKASH2907/stable-mean-teacher Project Page https://akash2907.github.io/smt webpage |
| Open Datasets | Yes | Datasets: We use four benchmark datasets to perform our experiments; UCF101-24 (2012), JHMDB21 (2013), and AVA v2.2 (AVA)(2018) for action detection, and You Tube VOS (2018c) to show generalization on video segmentation (VOS). |
| Dataset Splits | Yes | Labeled and unlabeled setup: The labeled and unlabeled subset for UCF101-24 and Youtube-VOS is divided in the ratio of 10:90 and for JHMDB21 it s 20:80. For the AVA dataset, we use 50% of the dataset for semi-sup setup. We utilize a 10:40 split between labeled to unlabeled ratio. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Implementation details We train the model for 50 epochs with a batch size of 8 where the number of samples from both labeled and unlabeled subsets are the same. The value of β for EMA parameters update is set to 0.99 which follows prior works (2022; 2021). The value of λ for the unsupervised loss is set to 0.1 determined empirically. |