Hyperbolic Self-paced Learning for Self-supervised Skeleton-based Action Representations
Authors: Luca Franco, Paolo Mandica, Bharti Munjal, Fabio Galasso
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | When tested on three established skeleton-based action recognition datasets, HYSP outperforms the state-of-the-art on PKU-MMD I, as well as on 2 out of 3 downstream tasks on NTU-60 and NTU-120. Code is available at https://github.com/paolomandica/HYSP. |
| Researcher Affiliation | Academia | Luca Franco 1 Paolo Mandica 1 Bharti Munjal1,2 Fabio Galasso1 1Sapienza University of Rome 2Technical University of Munich |
| Pseudocode | No | The paper includes mathematical equations for its model, but no structured pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Code is available at https://github.com/paolomandica/HYSP. |
| Open Datasets | Yes | NTU RGB+D 60 Dataset (Shahroudy et al., 2016). This contains 56,578 video sequences divided into 60 action classes, captured with three concurrent Kinect V2 cameras from 40 distinct subjects. The dataset follows two evaluation protocols: cross-subject (xsub), where the subjects are split evenly into train and test sets, and cross-view (xview), where the samples of one camera are used for testing while the others for training. |
| Dataset Splits | Yes | The dataset follows two evaluation protocols: cross-subject (xsub), where the subjects are split evenly into train and test sets, and cross-view (xview), where the samples of one camera are used for testing while the others for training. (...) Semi-supervised Protocol. Encoder and linear classifier are finetuned with 10% of the labeled data. |
| Hardware Specification | Yes | Training on 4 Nvidia Tesla A100 GPUs takes approximately 8 hours. |
| Software Dependencies | No | The paper mentions software components like 'ST-GCN', 'Riemannian SGD', 'BYOL', and 'SGD optimizer', but does not provide specific version numbers for these software dependencies or the underlying framework like PyTorch. |
| Experiment Setup | Yes | The encoder f is ST-GCN (Yu et al., 2018) with output dimension 1024. Following BYOL (Grill et al., 2020), the projector and predictor MLPs are linear layers with dimension 1024, followed by batch normalization, Re LU and a final linear layer with dimension 1024. The model is trained with batch size 512 and learning rate lr 0.2 in combination with Riemannian SGD (Kochurov et al., 2020) optimizer with momentum 0.9 and weight decay 0.0001. For curriculum learning, across all experiments, we set e1 = 50 and e2 = 100 in Eq. 8. (...) In downstream evaluation, the model is trained for 100 epochs using SGD optimizer with momentum 0.9 and weight decay 0. |