Spatio-Temporal Graph Routing for Skeleton-Based Action Recognition

Authors: Bin Li, Xi Li, Zhongfei Zhang, Fei Wu8561-8568

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on two benchmark datasets (NTU-RGB+D and Kinetics) demonstrate the effectiveness against the state-of-the-art.
Researcher Affiliation Academia Bin Li,1 Xi Li,2 Zhongfei Zhang,1 Fei Wu2 1College of Information Science & Electronic Engineering, Zhejiang University, Hangzhou, China 2College of Computer Science and Technology, Zhejiang University, Hangzhou, China {bin li, xilizju, zhongfei}@zju.edu.cn wufei@cs.zju.edu.cn
Pseudocode No The paper describes its methods using prose and mathematical equations but does not include any formally structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information, such as a repository link or an explicit statement of code release, for the methodology described.
Open Datasets Yes NTU-RGB+D. NTU-RGB+D (Shahroudy et al. 2016) is a widely used large scale skeleton-based human action recognition dataset. Kinetics. Deepmind Kinetics is recently one of the largest human action dataset... Following the previous practice (Yan, Xiong, and Lin 2018), we first extract raw 2D-coordinates with the help of Open Pose toolbox (Cao et al. 2017), then apply our model. In practice, we use the released data from (Yan, Xiong, and Lin 2018) to evaluate our model.
Dataset Splits Yes NTU-RGB+D recommends two evaluation protocals: 1). Cross-subject (X-Sub): The training and testing sets are divided into 40320 clips and 16560 clips respectively according to the difference of experiment subjects. 2). Cross-view (X-View): The training set is collected with camera view 2 and 3 with 37920 clips, while the evaluation set is collected from camera view 1 with 18960 clips.
Hardware Specification Yes All experiments are conducted on 4 GTX 1080Ti GPUs.
Software Dependencies No The paper mentions using 'SGD optimizer' but does not provide specific version numbers for any software libraries or dependencies (e.g., PyTorch, TensorFlow, Python, CUDA).
Experiment Setup Yes As for training, the whole network is trained with SGD optimizer with learning rate 0.1 for ST-GCN and 0.01 for STGR. The weight decay is 1e 4 and the batch size is set as 32. The balance parameter λ of classification loss and L1 loss is set as 0.2 since we mainly focus on classification result. We divide learning rate by 10 for both modules when monitoring validation loss stoping decrease over 5 epoches. Inspired by recent success (Li et al. 2018a) on skelteton-based action recognition, a two-stream scheme is applied to fuse both skeleton feature and motion feature. We need 60 training epoches for model convergence.