Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training

Authors: Renrui Zhang, Ziyu Guo, Peng Gao, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the state-of-the-art performance of Point-M2AE for 3D representation learning. With a frozen encoder after pretraining, Point-M2AE achieves 92.9% accuracy for linear SVM on Model Net40, even surpassing some fully trained methods. By fine-tuning on downstream tasks, Point-M2AE achieves 86.43% accuracy on Scan Object NN, +3.36% to the secondbest, and largely benefits the few-shot classification, part segmentation and 3D object detection with the hierarchical pre-training scheme.
Researcher Affiliation Collaboration Renrui Zhang1,2, Ziyu Guo2, Rongyao Fang1, Bin Zhao2, Dong Wang2, Yu Qiao2, Hongsheng Li1,3, Peng Gao B2 1 CUHK-Sense Time Joint Laboratory, The Chinese University of Hong Kong, 2 Shanghai AI Laboratory, 3 Centre for Perceptual and Interactive Intelligence Limited
Pseudocode No The paper describes methods with text and figures but does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/ZrrSkywalker/Point-M2AE.
Open Datasets Yes We pre-train our Point-M2AE on Shape Net [6] dataset, which contains 57,448 synthetic 3D shapes of 55 categories. We fine-tune Point-M2AE on two shape classification datasets: the widely adopted Model Net40 [44] and the challenging Scan Object NN [38]. We evaluate Point-M2AE for part segmentation on Shape Net Part [48]... we apply Point-M2AE to serving as the feature backbone on the indoor Scan Net V2 [9] dataset.
Dataset Splits Yes We pre-train our Point-M2AE on Shape Net [6] dataset... We test the 3D representation capability of Point M2AE via linear evaluation on Model Net40 [44]. We sample 1,024 points from each 3D shape of Model Net40... We fine-tune Point-M2AE on two shape classification datasets: the widely adopted Model Net40 [44] and the challenging Scan Object NN [38]. We follow Point-BERT to use the voting strategy [25] for fair comparison on Model Net40. (This implies using standard benchmark splits, further confirmed by the ethics statement: '3. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes]').
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes Settings. We pre-train our Point-M2AE on Shape Net [6] dataset... We set the stage number S as 3, and construct a 3-stage encoder and a 2-stage decoder for hierarchical learning. We adopt 5 blocks in each encoder stage, but only 1 block per stage for the lightweight decoder. For the 3-scale point clouds, we set the point numbers and token dimensions respectively as {512, 256, 64} and {96, 192, 384}. We also set different k for the k-NN at different scales, which are {16, 8, 8}. We mask the highest scale of point clouds with a high ratio of 80% and set 6 heads for all the attention modules. The detailed training settings are in Appendix.