Knowledge Integration Networks for Action Recognition

Authors: Shiwen Zhang, Sheng Guo, Limin Wang, Weilin Huang, Matthew Scott12862-12869

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed KINet achieves the state-of-the-art performance on a large-scale action recognition benchmark Kinetics-400, with a top-1 accuracy of 77.8%. We further demonstrate that our KINet has strong capability by transferring the Kinetics-trained model to UCF-101, where it obtains 97.8% top-1 accuracy.
Researcher Affiliation Collaboration 1Malong Technologies, Shenzhen, China 2Shenzhen Malong Artificial Intelligence Research Center, Shenzhen, China 3State Key Lab for Novel Software Technology, Nanjing University, China
Pseudocode No The paper describes algorithms and modules in text and diagrams, but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets Yes To verify the effectiveness of our KINet, we conduct experiments on a large-scale action recognition dataset Kinetics-400 (Carreira and Zisserman 2017), which contains 400 action categories, with about 240k videos for training and 20k videos for validation. We then examine the generalization ability of our KINet by transferring the learned representation to a small dataset UCF-101 (Soomro, Zamir, and Shah 2012), containing 101 action categories with 13,320 videos in total.
Dataset Splits Yes To verify the effectiveness of our KINet, we conduct experiments on a large-scale action recognition dataset Kinetics-400 (Carreira and Zisserman 2017), which contains 400 action categories, with about 240k videos for training and 20k videos for validation. [...] For UCF-101, we follow (Wang et al. 2016a) to fine tune the pretrained weights on Kinetics, where we have all but the first batch normalization layer frozen and the model is trained for 80 epochs. Inference. For fair comparison, we also follow (Wang et al. 2016a) by uniformly sampling 25 segments from each video and select one frame out of each segment.
Hardware Specification No The paper does not specify the hardware used for the experiments (e.g., GPU models, CPU types, memory).
Software Dependencies No The paper mentions various models and datasets but does not list specific software dependencies with version numbers (e.g., Python version, library versions).
Experiment Setup Yes We utilize SGD optimizer with initial learning rate set to 0.01, which drops by 10 at epoch 20, 40 and 60. The model is totally trained for 70 epochs. We set the weight decay to be 10 5 and the momentum to be 0.9.