Action Recognition with Multi-stream Motion Modeling and Mutual Information Maximization
Authors: Yuheng Yang, Haipeng Chen, Zhenguang Liu, Yingda Lyu, Beibei Zhang, Shuang Wu, Zhibo Wang, Kui Ren
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct extensive experiments to empirically evaluate our method on three benchmark action recognition datasets. Empirically, our approach sets the new state-of-the-art performance on three benchmark datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA. |
| Researcher Affiliation | Collaboration | Yuheng Yang1 , Haipeng Chen1 , Zhenguang Liu 2 , Yingda Lyu3 , Beibei Zhang5 , Shuang Wu4 , Zhibo Wang2 and Kui Ren 2 1College of Computer Science and Technology, Jilin University 2School of Cyber Science and Technology, Zhejiang University 3Public Computer Education and Research Center, Jilin University 4Black Sesame Technologies 5Zhejiang Lab |
| Pseudocode | No | The paper does not contain a section or figure explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | The implementations are released, hoping to facilitate future research. Our code is released to facilitate researchers. https://github.com/ActionR-Group/Stream-GCN |
| Open Datasets | Yes | We adopt three widely used action recognition datasets, namely NTU-RGB+D [Shahroudy et al., 2016], NTU-RGB+D 120 [Liu et al., 2019a], and Northwestern-UCLA [Wang et al., 2014], to evaluate the proposed method. |
| Dataset Splits | Yes | NTU-RGB+D. This dataset provides two sub-benchmarks: (1) Cross-Subject (X-Sub): data for 20 subjects is used as the training data, while the rest is used as test data. (2) Cross-View (X-View) divides the training and test sets according to different camera views. NTU-RGB+D 120. Within the dataset, two benchmarks are maintained: (1) Cross-Subject (X-Sub), which categorizes 53 subjects into the training class and the other 53 subjects into the test class. (2) Cross-Setup (X-Set), which arranges data items with even IDs into the training group and odd IDs into the test group. Northwestern-UCLA. We follow the evaluation protocol mentioned in [Wang et al., 2014], where videos collected by the first two cameras serve as the training samples and the rest serve as test samples. |
| Hardware Specification | Yes | We conduct experiments on a computer equipped with an Intel Xeon E5 CPU at 2.1GHz, three NVIDIA Ge Force GTX 1080 Ti GPUs, and the RAM of 64GB. |
| Software Dependencies | Yes | We leverage Py Torch 1.1 to implement our model. |
| Experiment Setup | Yes | We apply stochastic gradient descent (SGD) with 0.9 Nesterov momentum to train the Stream-GCN model. For NTU-RGB+D and NTU-RGB+D 120 datasets, the number of training epochs is set to 65 with the first 5 epochs being warm-up epochs, which help stabilize the training process. For NTU-RGB+D and NTU-RGB+D 120 datasets, the initial learning rate is set to 0.1 and decays by 0.1 every 35 epochs, the batch size is selected as 64. For the Northwestern-UCLA dataset, the initial learning rate is set to 0.01 and decays by 0.0001 every 50 epochs, the batch size is set to 16. |