Contrastive Predictive Autoencoders for Dynamic Point Cloud Self-Supervised Learning

Authors: Xiaoxiao Sheng, Zhiqiang Shen, Gang Xiao

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on four point cloud sequence benchmarks, and report the results on action recognition and gesture recognition under multiple experimental settings. The performances are comparable with supervised methods and show powerful transferability.
Researcher Affiliation Academia Xiaoxiao Sheng*, Zhiqiang Shen*, Gang Xiao Shanghai Jiao Tong University {shengxiaoxiao, shenzhiqiang, xiaogang}@sjtu.edu.cn
Pseudocode No The paper describes the architecture and processes with text and diagrams (Figure 1, Figure 2) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not include any statements or links indicating the provision of open-source code for the described methodology.
Open Datasets Yes We perform 3D action recognition on MSRAction3D and NTU-RGBD 60, and gesture recognition on Nv Gesture and SHREC 17 datasets.
Dataset Splits Yes MSRAction3D (Li, Zhang, and Liu 2010) ... We use the same training and test splits as (Liu, Yan, and Bohg 2019). NTU-RGBD 60 (Shahroudy et al. 2016) ... Cross-subject and cross-view evaluations are adopted. Nv Gesture (Molchanov et al. 2016) ... We follow the previous work to split this dataset, where 1050 videos are used for training and 482 videos are for test (Min et al. 2020). SHREC 17 (De Smedt et al. 2017) ... We adopt the same splits of training and test data as previous work (Min et al. 2020).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using the 'Adam optimizer' and 'SGD optimizer' but does not specify version numbers for any software or libraries.
Experiment Setup Yes We pretrain 200 epochs and set the batchsize to 88. We use Adam optimizer and cosine annealing scheduler with the initial learning rate 0.0008. Without special instructions, we adopt the pretrained point spatiotemporal encoder for downstream tasks, and add two linear layers with BN and Re Lu for finetuning or one linear layer for linear evaluation. We utilize SGD optimizer with momentum 0.9 and cosine scheduler with warmup 10 epochs. The 16 batchsize corresponds to the 0.01 learning rate, and we follow the scale up rule.