Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations
Authors: Jiahang Zhang, Lilang Lin, Jiaying Liu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that Hi CLR outperforms the state-of-the-art methods notably on three large-scale datasets, i.e., NTU60, NTU120, and PKUMMD. |
| Researcher Affiliation | Academia | Jiahang Zhang, Lilang Lin, Jiaying Liu* Wangxuan Institute of Computer Technology, Peking University, Beijing, China {zjh2020, linlilang, liujiaying}@pku.edu.cn |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our project is publicly available at: https://jhang2020.github.io/Projects/Hi CLR/Hi CLR.html. |
| Open Datasets | Yes | 1) NTU RGB+D Dataset 60 (NTU60) (Shahroudy et al. 2016) is a large-scale dataset that contains 56,578 samples with 60 action categories and 25 joints. 2) NTU RGB+D Dataset 120 (NTU120) (Liu et al. 2019) is an extension to NTU60. 114,480 videos are collected with 120 action categories. 3) PKU Multi-Modality Dataset (PKUMMD) (Liu et al. 2020a) is a large-scale dataset covering a multi-modality 3D understanding of human actions with almost 20,000 instances and 51 action labels. |
| Dataset Splits | Yes | We follow the two recommended protocols: a) Cross-Subject (xsub): the data for training and testing are collected from different subjects. b) Cross-View (xview): the data for training and testing are collected from different camera views. ...Two recommended protocols are adopted: a) Cross-Subject (xsub): the data for training and testing are collected from 106 different subjects. b) Cross Setup (xset): the data for training and testing are collected from 32 different setups. ...The cross-subject protocol is adopted. ...In semi-supervised evaluation, we pre-train the encoder with all unlabeled data, and then train the whole model with randomly sampled 1%, 10% of the training data. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "ST-GCN" and "DSTA-Net" as backbones, and the "SGD optimizer" but does not specify any software versions (e.g., PyTorch version, Python version, specific library versions). |
| Experiment Setup | Yes | All skeleton data are pre-processed into 50 frames. We reduce the number of channels in each graph convolution layer to 1/4 of the original setting for ST-GCN and 1/2 for DSTA-Net, respectively. The dimension of the final output feature is 128 and the size of the memory bank M is set to 32,768. The model is trained for 300 epochs with a batch-size of 128 using the SGD optimizer. λh is set to 0.5. |