Hyperbolic Self-paced Learning for Self-supervised Skeleton-based Action Representations

Authors: Luca Franco, Paolo Mandica, Bharti Munjal, Fabio Galasso

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental When tested on three established skeleton-based action recognition datasets, HYSP outperforms the state-of-the-art on PKU-MMD I, as well as on 2 out of 3 downstream tasks on NTU-60 and NTU-120. Code is available at https://github.com/paolomandica/HYSP.
Researcher Affiliation Academia Luca Franco 1 Paolo Mandica 1 Bharti Munjal1,2 Fabio Galasso1 1Sapienza University of Rome 2Technical University of Munich
Pseudocode No The paper includes mathematical equations for its model, but no structured pseudocode or algorithm blocks are provided.
Open Source Code Yes Code is available at https://github.com/paolomandica/HYSP.
Open Datasets Yes NTU RGB+D 60 Dataset (Shahroudy et al., 2016). This contains 56,578 video sequences divided into 60 action classes, captured with three concurrent Kinect V2 cameras from 40 distinct subjects. The dataset follows two evaluation protocols: cross-subject (xsub), where the subjects are split evenly into train and test sets, and cross-view (xview), where the samples of one camera are used for testing while the others for training.
Dataset Splits Yes The dataset follows two evaluation protocols: cross-subject (xsub), where the subjects are split evenly into train and test sets, and cross-view (xview), where the samples of one camera are used for testing while the others for training. (...) Semi-supervised Protocol. Encoder and linear classifier are finetuned with 10% of the labeled data.
Hardware Specification Yes Training on 4 Nvidia Tesla A100 GPUs takes approximately 8 hours.
Software Dependencies No The paper mentions software components like 'ST-GCN', 'Riemannian SGD', 'BYOL', and 'SGD optimizer', but does not provide specific version numbers for these software dependencies or the underlying framework like PyTorch.
Experiment Setup Yes The encoder f is ST-GCN (Yu et al., 2018) with output dimension 1024. Following BYOL (Grill et al., 2020), the projector and predictor MLPs are linear layers with dimension 1024, followed by batch normalization, Re LU and a final linear layer with dimension 1024. The model is trained with batch size 512 and learning rate lr 0.2 in combination with Riemannian SGD (Kochurov et al., 2020) optimizer with momentum 0.9 and weight decay 0.0001. For curriculum learning, across all experiments, we set e1 = 50 and e2 = 100 in Eq. 8. (...) In downstream evaluation, the model is trained for 100 epochs using SGD optimizer with momentum 0.9 and weight decay 0.