Multi-dataset Training of Transformers for Robust Action Recognition

Authors: Junwei Liang, Enwei Zhang, Jun Zhang, Chunhua Shen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2 datasets. Extensive experimental results show that our method can consistently improve state-of-the-art performance.
Researcher Affiliation Collaboration Junwei Liang1 , Enwei Zhang2, Jun Zhang2, Chunhua Shen3 1AI Thrust, Hong Kong University of Science and Technology (Guangzhou) 2Tencent Youtu Lab 3Zhejiang University
Pseudocode No The paper provides mathematical equations for model components like the MViTv2 block and loss functions, but it does not present structured pseudocode or algorithm blocks for the overall method.
Open Source Code Yes Code and models are available at https://github.com/Junwei Liang/Multi Train
Open Datasets Yes We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2 datasets.
Dataset Splits Yes Kinetics-400 [28] (K400) consists of about 240K training videos and 20K validation videos in 400 human action classes. Kinetics-700 [7] (K700) extends the action classes to 700 with 545K training and 35K validation videos. The Moments-in-Time (Mi T) dataset is one of the largest action dataset with 727K training and 30k validation videos.
Hardware Specification No The paper states 'See supplemental material' for compute and resource details, but the supplemental material is not provided in this context. Therefore, specific hardware details are not available in the main paper.
Software Dependencies No The paper mentions using MViTv2 and refers to the use of a PyTorch-based framework in supplementary material instructions (not provided), but it does not specify concrete version numbers for any software dependencies like PyTorch, Python, or other libraries in the main text.
Experiment Setup No The paper states 'Our models are trained from scratch with random initialization, without using any pre-training' and refers to supplementary material for more details on implementation, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed optimizer settings in the main text.