Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations

Authors: Jiahang Zhang, Lilang Lin, Jiaying Liu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that Hi CLR outperforms the state-of-the-art methods notably on three large-scale datasets, i.e., NTU60, NTU120, and PKUMMD.
Researcher Affiliation Academia Jiahang Zhang, Lilang Lin, Jiaying Liu* Wangxuan Institute of Computer Technology, Peking University, Beijing, China {zjh2020, linlilang, liujiaying}@pku.edu.cn
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes Our project is publicly available at: https://jhang2020.github.io/Projects/Hi CLR/Hi CLR.html.
Open Datasets Yes 1) NTU RGB+D Dataset 60 (NTU60) (Shahroudy et al. 2016) is a large-scale dataset that contains 56,578 samples with 60 action categories and 25 joints. 2) NTU RGB+D Dataset 120 (NTU120) (Liu et al. 2019) is an extension to NTU60. 114,480 videos are collected with 120 action categories. 3) PKU Multi-Modality Dataset (PKUMMD) (Liu et al. 2020a) is a large-scale dataset covering a multi-modality 3D understanding of human actions with almost 20,000 instances and 51 action labels.
Dataset Splits Yes We follow the two recommended protocols: a) Cross-Subject (xsub): the data for training and testing are collected from different subjects. b) Cross-View (xview): the data for training and testing are collected from different camera views. ...Two recommended protocols are adopted: a) Cross-Subject (xsub): the data for training and testing are collected from 106 different subjects. b) Cross Setup (xset): the data for training and testing are collected from 32 different setups. ...The cross-subject protocol is adopted. ...In semi-supervised evaluation, we pre-train the encoder with all unlabeled data, and then train the whole model with randomly sampled 1%, 10% of the training data.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running the experiments.
Software Dependencies No The paper mentions using "ST-GCN" and "DSTA-Net" as backbones, and the "SGD optimizer" but does not specify any software versions (e.g., PyTorch version, Python version, specific library versions).
Experiment Setup Yes All skeleton data are pre-processed into 50 frames. We reduce the number of channels in each graph convolution layer to 1/4 of the original setting for ST-GCN and 1/2 for DSTA-Net, respectively. The dimension of the final output feature is 128 and the size of the memory bank M is set to 32,768. The model is trained for 300 epochs with a batch-size of 128 using the SGD optimizer. λh is set to 0.5.