Multi-dataset Training of Transformers for Robust Action Recognition
Authors: Junwei Liang, Enwei Zhang, Jun Zhang, Chunhua Shen
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2 datasets. Extensive experimental results show that our method can consistently improve state-of-the-art performance. |
| Researcher Affiliation | Collaboration | Junwei Liang1 , Enwei Zhang2, Jun Zhang2, Chunhua Shen3 1AI Thrust, Hong Kong University of Science and Technology (Guangzhou) 2Tencent Youtu Lab 3Zhejiang University |
| Pseudocode | No | The paper provides mathematical equations for model components like the MViTv2 block and loss functions, but it does not present structured pseudocode or algorithm blocks for the overall method. |
| Open Source Code | Yes | Code and models are available at https://github.com/Junwei Liang/Multi Train |
| Open Datasets | Yes | We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2 datasets. |
| Dataset Splits | Yes | Kinetics-400 [28] (K400) consists of about 240K training videos and 20K validation videos in 400 human action classes. Kinetics-700 [7] (K700) extends the action classes to 700 with 545K training and 35K validation videos. The Moments-in-Time (Mi T) dataset is one of the largest action dataset with 727K training and 30k validation videos. |
| Hardware Specification | No | The paper states 'See supplemental material' for compute and resource details, but the supplemental material is not provided in this context. Therefore, specific hardware details are not available in the main paper. |
| Software Dependencies | No | The paper mentions using MViTv2 and refers to the use of a PyTorch-based framework in supplementary material instructions (not provided), but it does not specify concrete version numbers for any software dependencies like PyTorch, Python, or other libraries in the main text. |
| Experiment Setup | No | The paper states 'Our models are trained from scratch with random initialization, without using any pre-training' and refers to supplementary material for more details on implementation, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed optimizer settings in the main text. |