Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

AIM: Adapting Image Models for Efficient Video Action Recognition

Authors: Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, Mu Li

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed method on four widely adopted video action recognition benchmarks, Kinetics-400 (K400) (Kay et al., 2017), Kinetics-700 (K700) (Carreira et al., 2019), Something-something-v2 (SSv2) (Goyal et al., 2017) and Diving-48 (Li et al., 2018).
Researcher Affiliation	Collaboration	Taojiannan Yang1 , Yi Zhu2, Yusheng Xie2, Aston Zhang2, Chen Chen1, Mu Li2 1Center for Research in Computer Vision, University of Central Florida 2Amazon Web Services
Pseudocode	Yes	Algorithm 1 Pseudo-code of an adapted Vi T block class Transformer Block():
Open Source Code	Yes	The project webpage is https://adapt-image-models.github.io/.
Open Datasets	Yes	We evaluate the proposed method on four widely adopted video action recognition benchmarks, Kinetics-400 (K400) (Kay et al., 2017), Kinetics-700 (K700) (Carreira et al., 2019), Something-something-v2 (SSv2) (Goyal et al., 2017) and Diving-48 (Li et al., 2018).
Dataset Splits	Yes	K400 contains around 240K training videos and 20K validation videos in 400 human action classes. K700 is an extended version of K400 which contains around 530K training videos and 34K validation videos in 700 classes. SSv2 contains 168.9K training videos and 24.7K validation videos in 174 classes. Diving-48 contains 15.9K training videos and 2K validation videos in 48 fine-grained diving actions.
Hardware Specification	Yes	All metrics are measured on 8 Tesla V100 GPUs.
Software Dependencies	No	The paper mentions using Adam W optimizer and Rand Augment and random erasing for data augmentation, citing the original papers for these methods. However, it does not specify version numbers for any software libraries or programming languages used for implementation (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	The model is trained for 30 epochs using Adam W (Kingma & Ba, 2014) optimizer with a batchsize of 64. The base learning rate is 3e-4 and weight decay is 5e-2. The learning rate is warmed up from 0 in the first 3 epochs and then decays following a cosine schedule. The stochastic depth rate is 0.2 for both Vi T-B and Vi T-L.