Towards To-a-T Spatio-Temporal Focus for Skeleton-Based Action Recognition

Authors: Lipeng Ke, Kuan-Chuan Peng, Siwei Lyu1131-1139

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental STF outperforms the state-of-the-art methods on the NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400 datasets in all 15 settings over different views, subjects, setups, and input modalities, and STF also shows better accuracy on scarce data and dataset shifting settings.
Researcher Affiliation Collaboration Lipeng Ke1, Kuan-Chuan Peng2, Siwei Lyu1 1 University at Buffalo, State University of New York 2 Mitsubishi Electric Research Laboratories lipengke@buffalo.edu, kpeng@merl.com, siweilyu@buffalo.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper references 'MS-G3D official code. https://github.com/kenziyuliu/MS-G3D.git. Accessed: 2020-08-08.' for the baseline model, but does not provide a link or explicit statement for the open-source code of their own proposed STF method.
Open Datasets Yes We conduct experiments on three benchmark datasets, namely, the NTU RGB+D 60 (Shahroudy et al. 2016), NTU RGB+D 120 (Liu et al. 2019), and Kinetics Skeleton 400 (Kay et al. 2017) datasets (denoted as NTU-60, NTU120, and Kinetics-400, respectively).
Dataset Splits No The paper describes specific training and testing splits for NTU RGB+D 60 and NTU RGB+D 120 datasets (e.g., 'cross-Subject (x-sub), where the dataset is equally split as training and testing sets of 20 subjects each'), but does not explicitly provide details for a separate validation dataset split.
Hardware Specification No The paper mentions 'MS Kinect' as hardware used for data capture but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'MS-G3D' as the backbone and 'Open Pose' for data conversion, but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiment.
Experiment Setup Yes The MS-G3D baseline model is trained using SGD with momentum 0.9, batch size 32, initial learning rate 0.05, and weight decay 0.0005, and the base learning rate is adjusted accordingly for different settings. For NTU-60, NTU-120, and Kinetics-400, the learning rate is decayed at [20, 35, 45], [20, 35, 50], [25, 40, 55] epochs, respectively. After that, we pre-train the STF model from the MS-G3D baseline model, with lower initial learning rates {10 3, 5 10 4, 10 4}. We empirically set λe/λd/λc/λGk as 0.01/0.1/0.1/0.01, respectively, such that all loss terms have comparable absolute ranges.