reproducibilityindex.ai

TAda! Temporally-Adaptive Convolutions for Video Understanding

Authors: Ziyuan Huang, Shiwei Zhang, Liang Pan, Zhiwu Qing, Mingqian Tang, Ziwei Liu, Marcelo H Ang Jr

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We construct TAda2D and TAda Conv Ne Xt networks by replacing the 2D convolutions in Res Net and Conv Ne Xt with TAda Conv, which leads to at least on par or better performance compared to state-of-the-art approaches on multiple video action recognition and localization benchmarks. We also demonstrate that as a readily plug-in operation with negligible computation overhead, TAda Conv can effectively improve many existing video models with a convincing margin.
Researcher Affiliation	Collaboration	1Advanced Robotics Centre, National University of Singapore 2DAMO Academy, Alibaba Group 3S-Lab, Nanyang Technological University
Pseudocode	No	The paper describes procedures using text and mathematical formulations but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	2Project page: https://tadaconv-iclr2022.github.io/.
Open Datasets	Yes	For video classiﬁcation, we use Kinetics-400 (Kay et al., 2017), Something-Something V2 (Goyal et al., 2017), and Epic-Kitchens-100 (Damen et al., 2020). For action localization, we use HACS (Zhao et al., 2019) and Epic-Kitchens-100 (Damen et al., 2020).
Dataset Splits	No	The paper mentions training and evaluation processes, including frame sampling and data augmentation strategies (Appendix C.1), and uses 'validation' in its figures (Appendix I). However, it does not explicitly describe the dataset splits (e.g., percentages or sample counts for training, validation, and test sets) in the main text or appendix, nor does it explicitly state the use of standard splits for the cited datasets.
Hardware Specification	No	Our experiments on the action classiﬁcation are conducted on three large-scale datasets. For all action classiﬁcation models, we train them with synchronized SGD using 16 GPUs.
Software Dependencies	No	The paper mentions optimization algorithms like SGD and AdamW, and deep learning models (ResNet, ConvNeXt), but it does not specify version numbers for any software components such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	Our experiments on the action classiﬁcation are conducted on three large-scale datasets. For all action classiﬁcation models, we train them with synchronized SGD using 16 GPUs. The batch size for each GPU is 16 and 8 respectively for 8-frame and 16-frame models... For all models, we use a dropout ratio (Hinton et al., 2012) of 0.5 before the classiﬁcation heads. Spatially, we randomly resize the short side of the video to [256, 320] and crop a region of 224 224... On Kinetics-400, a half-period cosine schedule is applied for decaying the learning rate following Feichtenhofer et al. (2019), with the base learning rate set to 0.24 for Res Net-base models using SGD... The models are trained for 100 epochs. In the ﬁrst 8 epochs, we adopt a linear warm-up strategy starting from a learning rate of 0.01. The weight decay is set to 1e-4.