Knowledge Integration Networks for Action Recognition
Authors: Shiwen Zhang, Sheng Guo, Limin Wang, Weilin Huang, Matthew Scott12862-12869
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed KINet achieves the state-of-the-art performance on a large-scale action recognition benchmark Kinetics-400, with a top-1 accuracy of 77.8%. We further demonstrate that our KINet has strong capability by transferring the Kinetics-trained model to UCF-101, where it obtains 97.8% top-1 accuracy. |
| Researcher Affiliation | Collaboration | 1Malong Technologies, Shenzhen, China 2Shenzhen Malong Artificial Intelligence Research Center, Shenzhen, China 3State Key Lab for Novel Software Technology, Nanjing University, China |
| Pseudocode | No | The paper describes algorithms and modules in text and diagrams, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository. |
| Open Datasets | Yes | To verify the effectiveness of our KINet, we conduct experiments on a large-scale action recognition dataset Kinetics-400 (Carreira and Zisserman 2017), which contains 400 action categories, with about 240k videos for training and 20k videos for validation. We then examine the generalization ability of our KINet by transferring the learned representation to a small dataset UCF-101 (Soomro, Zamir, and Shah 2012), containing 101 action categories with 13,320 videos in total. |
| Dataset Splits | Yes | To verify the effectiveness of our KINet, we conduct experiments on a large-scale action recognition dataset Kinetics-400 (Carreira and Zisserman 2017), which contains 400 action categories, with about 240k videos for training and 20k videos for validation. [...] For UCF-101, we follow (Wang et al. 2016a) to fine tune the pretrained weights on Kinetics, where we have all but the first batch normalization layer frozen and the model is trained for 80 epochs. Inference. For fair comparison, we also follow (Wang et al. 2016a) by uniformly sampling 25 segments from each video and select one frame out of each segment. |
| Hardware Specification | No | The paper does not specify the hardware used for the experiments (e.g., GPU models, CPU types, memory). |
| Software Dependencies | No | The paper mentions various models and datasets but does not list specific software dependencies with version numbers (e.g., Python version, library versions). |
| Experiment Setup | Yes | We utilize SGD optimizer with initial learning rate set to 0.01, which drops by 10 at epoch 20, 40 and 60. The model is totally trained for 70 epochs. We set the weight decay to be 10 5 and the momentum to be 0.9. |