Action Recognition with Multi-stream Motion Modeling and Mutual Information Maximization

Authors: Yuheng Yang, Haipeng Chen, Zhenguang Liu, Yingda Lyu, Beibei Zhang, Shuang Wu, Zhibo Wang, Kui Ren

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct extensive experiments to empirically evaluate our method on three benchmark action recognition datasets. Empirically, our approach sets the new state-of-the-art performance on three benchmark datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA.
Researcher Affiliation Collaboration Yuheng Yang1 , Haipeng Chen1 , Zhenguang Liu 2 , Yingda Lyu3 , Beibei Zhang5 , Shuang Wu4 , Zhibo Wang2 and Kui Ren 2 1College of Computer Science and Technology, Jilin University 2School of Cyber Science and Technology, Zhejiang University 3Public Computer Education and Research Center, Jilin University 4Black Sesame Technologies 5Zhejiang Lab
Pseudocode No The paper does not contain a section or figure explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes The implementations are released, hoping to facilitate future research. Our code is released to facilitate researchers. https://github.com/ActionR-Group/Stream-GCN
Open Datasets Yes We adopt three widely used action recognition datasets, namely NTU-RGB+D [Shahroudy et al., 2016], NTU-RGB+D 120 [Liu et al., 2019a], and Northwestern-UCLA [Wang et al., 2014], to evaluate the proposed method.
Dataset Splits Yes NTU-RGB+D. This dataset provides two sub-benchmarks: (1) Cross-Subject (X-Sub): data for 20 subjects is used as the training data, while the rest is used as test data. (2) Cross-View (X-View) divides the training and test sets according to different camera views. NTU-RGB+D 120. Within the dataset, two benchmarks are maintained: (1) Cross-Subject (X-Sub), which categorizes 53 subjects into the training class and the other 53 subjects into the test class. (2) Cross-Setup (X-Set), which arranges data items with even IDs into the training group and odd IDs into the test group. Northwestern-UCLA. We follow the evaluation protocol mentioned in [Wang et al., 2014], where videos collected by the first two cameras serve as the training samples and the rest serve as test samples.
Hardware Specification Yes We conduct experiments on a computer equipped with an Intel Xeon E5 CPU at 2.1GHz, three NVIDIA Ge Force GTX 1080 Ti GPUs, and the RAM of 64GB.
Software Dependencies Yes We leverage Py Torch 1.1 to implement our model.
Experiment Setup Yes We apply stochastic gradient descent (SGD) with 0.9 Nesterov momentum to train the Stream-GCN model. For NTU-RGB+D and NTU-RGB+D 120 datasets, the number of training epochs is set to 65 with the first 5 epochs being warm-up epochs, which help stabilize the training process. For NTU-RGB+D and NTU-RGB+D 120 datasets, the initial learning rate is set to 0.1 and decays by 0.1 every 35 epochs, the batch size is selected as 64. For the Northwestern-UCLA dataset, the initial learning rate is set to 0.01 and decays by 0.0001 every 50 epochs, the batch size is set to 16.