Self-Supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences

Authors: Yujie Zhou, Haodong Duan, Anyi Rao, Bing Su, Jiaqi Wang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method achieves state-of-the-art performance on NTURGB+D 60, NTURGB+D 120 and PKU-MMD under various downstream tasks. Furthermore, to simulate the real-world scenarios, a practical evaluation is performed where some skeleton joints are lost in downstream tasks.
Researcher Affiliation Academia Yujie Zhou1, 4, Haodong Duan3, Anyi Rao3, Bing Su1, 2*, Jiaqi Wang4 1Gaoling School of Artificial Intelligence, Renmin University of China 2Beijing Key Laboratory of Big Data Management and Analysis Methods 3Chinese University of Hong Kong 4Shanghai AI Laboratory
Pseudocode No The paper describes the proposed method in narrative text and with figures (Figure 1, Figure 2) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/Yujie Ou O/PSTL.git.
Open Datasets Yes NTU-RGB+D 60. NTU-60 dataset (Shahroudy et al. 2016) is collected by Microsoft Kinect sensors. NTU-RGB+D 120. NTU-120 dataset (Liu et al. 2019) is the extended version of the NTU-60... PKU-MMD. PKU-MMD dataset (Liu et al. 2020) is captured via the Kinect v2 sensors from multiple viewpoints.
Dataset Splits No The paper describes training and testing sets for the datasets but does not explicitly mention a 'validation' set or its specific split percentage/methodology for reproducing data partitioning. For instance, 'NTU-RGB+D 60...There are two official dataset splits: 1) Cross-Subject (xsub): half of the subjects belong to the training set, and the rest make up the testing sets; 2) Cross-View (xview): training and testing sets are captured by cameras with different views.'
Hardware Specification No The paper does not specify any particular hardware used for running the experiments. It details implementation aspects like optimizer, scheduler, batch size, and data augmentation, but lacks information on GPU models, CPU types, or other hardware specifications.
Software Dependencies No The paper does not provide specific software dependencies with version numbers. It mentions the use of 'ST-GCN' as the backbone and 'Adam optimizer' with 'Cosine Annealing scheduler', but without any version information for these or other libraries.
Experiment Setup Yes To perform an apple-to-apple comparison with other methods, we follow the same pre-processing methods of Cros SCLR (Li et al. 2021) and Aim CLR (Guo et al. 2022), which resize the skeleton sequences to 50 frames. Similarly, ST-GCN (Yan, Xiong, and Lin 2018) with 16 hidden channels is used as the backbone. For all experiments (both representation learning and downstream tasks), we adopt Adam optimizer and the Cosine Annealing scheduler with 150 epochs. The mini batch size is 128. ... λ in the loss of each stream is set to 2e-4. A 10-epoch warmup is used for stabilizing the training process. The weight decay is set to 1e-5. Note that for all datasets and evaluation settings, the number of mask joints is 9 in CSM and K = 10 in MATM.