TAda! Temporally-Adaptive Convolutions for Video Understanding
Authors: Ziyuan Huang, Shiwei Zhang, Liang Pan, Zhiwu Qing, Mingqian Tang, Ziwei Liu, Marcelo H Ang Jr
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We construct TAda2D and TAda Conv Ne Xt networks by replacing the 2D convolutions in Res Net and Conv Ne Xt with TAda Conv, which leads to at least on par or better performance compared to state-of-the-art approaches on multiple video action recognition and localization benchmarks. We also demonstrate that as a readily plug-in operation with negligible computation overhead, TAda Conv can effectively improve many existing video models with a convincing margin. |
| Researcher Affiliation | Collaboration | 1Advanced Robotics Centre, National University of Singapore 2DAMO Academy, Alibaba Group 3S-Lab, Nanyang Technological University |
| Pseudocode | No | The paper describes procedures using text and mathematical formulations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 2Project page: https://tadaconv-iclr2022.github.io/. |
| Open Datasets | Yes | For video classification, we use Kinetics-400 (Kay et al., 2017), Something-Something V2 (Goyal et al., 2017), and Epic-Kitchens-100 (Damen et al., 2020). For action localization, we use HACS (Zhao et al., 2019) and Epic-Kitchens-100 (Damen et al., 2020). |
| Dataset Splits | No | The paper mentions training and evaluation processes, including frame sampling and data augmentation strategies (Appendix C.1), and uses 'validation' in its figures (Appendix I). However, it does not explicitly describe the dataset splits (e.g., percentages or sample counts for training, validation, and test sets) in the main text or appendix, nor does it explicitly state the use of standard splits for the cited datasets. |
| Hardware Specification | No | Our experiments on the action classification are conducted on three large-scale datasets. For all action classification models, we train them with synchronized SGD using 16 GPUs. |
| Software Dependencies | No | The paper mentions optimization algorithms like SGD and AdamW, and deep learning models (ResNet, ConvNeXt), but it does not specify version numbers for any software components such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Our experiments on the action classification are conducted on three large-scale datasets. For all action classification models, we train them with synchronized SGD using 16 GPUs. The batch size for each GPU is 16 and 8 respectively for 8-frame and 16-frame models... For all models, we use a dropout ratio (Hinton et al., 2012) of 0.5 before the classification heads. Spatially, we randomly resize the short side of the video to [256, 320] and crop a region of 224 224... On Kinetics-400, a half-period cosine schedule is applied for decaying the learning rate following Feichtenhofer et al. (2019), with the base learning rate set to 0.24 for Res Net-base models using SGD... The models are trained for 100 epochs. In the first 8 epochs, we adopt a linear warm-up strategy starting from a learning rate of 0.01. The weight decay is set to 1e-4. |