AIM: Adapting Image Models for Efficient Video Action Recognition
Authors: Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, Mu Li
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed method on four widely adopted video action recognition benchmarks, Kinetics-400 (K400) (Kay et al., 2017), Kinetics-700 (K700) (Carreira et al., 2019), Something-something-v2 (SSv2) (Goyal et al., 2017) and Diving-48 (Li et al., 2018). |
| Researcher Affiliation | Collaboration | Taojiannan Yang1 , Yi Zhu2, Yusheng Xie2, Aston Zhang2, Chen Chen1, Mu Li2 1Center for Research in Computer Vision, University of Central Florida 2Amazon Web Services |
| Pseudocode | Yes | Algorithm 1 Pseudo-code of an adapted Vi T block class Transformer Block(): |
| Open Source Code | Yes | The project webpage is https://adapt-image-models.github.io/. |
| Open Datasets | Yes | We evaluate the proposed method on four widely adopted video action recognition benchmarks, Kinetics-400 (K400) (Kay et al., 2017), Kinetics-700 (K700) (Carreira et al., 2019), Something-something-v2 (SSv2) (Goyal et al., 2017) and Diving-48 (Li et al., 2018). |
| Dataset Splits | Yes | K400 contains around 240K training videos and 20K validation videos in 400 human action classes. K700 is an extended version of K400 which contains around 530K training videos and 34K validation videos in 700 classes. SSv2 contains 168.9K training videos and 24.7K validation videos in 174 classes. Diving-48 contains 15.9K training videos and 2K validation videos in 48 fine-grained diving actions. |
| Hardware Specification | Yes | All metrics are measured on 8 Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions using Adam W optimizer and Rand Augment and random erasing for data augmentation, citing the original papers for these methods. However, it does not specify version numbers for any software libraries or programming languages used for implementation (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | The model is trained for 30 epochs using Adam W (Kingma & Ba, 2014) optimizer with a batchsize of 64. The base learning rate is 3e-4 and weight decay is 5e-2. The learning rate is warmed up from 0 in the first 3 epochs and then decays following a cosine schedule. The stochastic depth rate is 0.2 for both Vi T-B and Vi T-L. |