Learning Comprehensive Motion Representation for Action Recognition

Authors: Mingyu Wu, Boyuan Jiang, Donghao Luo, Junchi Yan, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Xiaokang Yang2934-2942

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our method on three large-scale benchmark datasets, i.e., Something-Something V1 & V2 (Goyal et al. 2017) and Kinetics-400 (Kay et al. 2017). Furthermore, hyperparameter in our method is discussed. We also conduct ablation study on the temporal reasoning dataset Something Something V1 to analyze CME and SME s performance individually and visualize each part s effect. Finally, we give runtime analysis to show the efficiency of our method compared with state-of-the-art methods.
Researcher Affiliation Collaboration 1 Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 2 Department of Computer Science and Engineering, Shanghai Jiao Tong University 3 Youtu Lab, Tencent
Pseudocode No The paper describes the proposed modules and framework using text and mathematical equations, but it does not include formal pseudocode or algorithm blocks.
Open Source Code No The paper does not include an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes We test our method on three large-scale benchmark datasets, i.e., Something-Something V1 & V2 (Goyal et al. 2017) and Kinetics-400 (Kay et al. 2017).
Dataset Splits Yes The subscripts of Val and Test indicate dataset version and top-1 accuracy is reported.
Hardware Specification Yes We follow the inference settings in (Lin, Gan, and Han 2019) by using a single NVIDIA Tesla V100 GPU to measure the latency and throughput.
Software Dependencies No The paper mentions using Res Net-50 and ImageNet pre-training but does not specify software versions for libraries like PyTorch, TensorFlow, or Python.
Experiment Setup Yes For the Something Something dataset, we train the model for 50 epochs, set the initial learning rate to 0.01 and reduce it by a factor of 10 at 30, 40, 45 epochs. For Kinetics-400, our model is trained for 100 epochs. The initial learning rate is set to 0.01 and will be reduced by a factor of 10 at 50, 75, and 90 epochs. Stochastic Gradient Decent (SGD) with momentum 0.9 is utilized as the optimizer, and the batch size is 64 for all three datasets.