Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Smooth Regularization for Efficient Video Recognition

Authors: Gil Goldman, Raja Giryes, Mahadev Satyanarayanan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our proposed smooth regularization on such lightweight models. To verify its benefits, we trained these smaller-scale networks on the popular Kinetics-600 dataset, a large benchmark known for its diversity of human action classes. By simply adding our novel loss function to the training of these architectures, we get consistent gains of 3.8% 6.4% in classification accuracy as shown in Figure 1, leading to new state-of-the-art performance under FLOP and memory constraints.
Researcher Affiliation Academia Gil Goldman Computer Science Department Carnegie Mellon University EMAIL Raja Giryes School of Electrical and Computer Engineering Tel-Aviv University EMAIL Mahadev Satyanarayanan Computer Science Department Carnegie Mellon University EMAIL
Pseudocode No The paper describes the mathematical formulation of the GRW smoothing term in Section 3.1 and its application to neural networks in Section 3.2, including equations and descriptive text, but it does not present a structured pseudocode or algorithm block.
Open Source Code Yes Our code and models are available at https://github.com/cmusatyalab/grw-smoothing.
Open Datasets Yes We report our results on Kinetics-600 (K600) [4] and Kinetics-400 (K400) [14]. Both datasets consist of 10-second videos of varying resolutions and frame rates, labeled with 600 and 400 action classes.
Dataset Splits No The paper mentions using Kinetics-600 and Kinetics-400 datasets for training and evaluation. While it references the datasets, it does not explicitly provide details about the specific training, validation, and test splits used (e.g., percentages, sample counts, or citations to predefined splits for these datasets).
Hardware Specification Yes The smaller models, A0 A1 and Mobile Net, were trained on a single dgx-A100 for 3 5 days, while the A2 and A3 models were trained on 2 dgx-A100 for 5 days.
Software Dependencies No The paper discusses various models like Mo Vi Nets and Mobile Net V3, and mentions using a 2-layer vanilla transformer, but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes We set λ = 10-1 as the balancing factor in Equation (7) and α = 1/2 as the scaling factor in Equation (6). We used 5 fps with T=5 (covering 1 s), and for Mo Vi Net-A3-GRW we use 12 fps with T=6 (covering 0.5 s). For these values of T we enumerate the full set of orderings in Equation (6), of size (T-1)! (i.e., 24 for T=5 and 120 for T=6), so no permutation subsampling was required (i.e., k was not used). We fine-tune for 14 epochs on K600 and 10 epochs on K400. We use different training rates for the transformer head and model backbone, decreasing with a cosine learning rate scheduler in the range [10-4, 10-6] for the model backbone and [10-3, 10-5] for the transformer head.