Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
Authors: Renrui Zhang, Ziyu Guo, Peng Gao, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the state-of-the-art performance of Point-M2AE for 3D representation learning. With a frozen encoder after pretraining, Point-M2AE achieves 92.9% accuracy for linear SVM on Model Net40, even surpassing some fully trained methods. By fine-tuning on downstream tasks, Point-M2AE achieves 86.43% accuracy on Scan Object NN, +3.36% to the secondbest, and largely benefits the few-shot classification, part segmentation and 3D object detection with the hierarchical pre-training scheme. |
| Researcher Affiliation | Collaboration | Renrui Zhang1,2, Ziyu Guo2, Rongyao Fang1, Bin Zhao2, Dong Wang2, Yu Qiao2, Hongsheng Li1,3, Peng Gao B2 1 CUHK-Sense Time Joint Laboratory, The Chinese University of Hong Kong, 2 Shanghai AI Laboratory, 3 Centre for Perceptual and Interactive Intelligence Limited |
| Pseudocode | No | The paper describes methods with text and figures but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/ZrrSkywalker/Point-M2AE. |
| Open Datasets | Yes | We pre-train our Point-M2AE on Shape Net [6] dataset, which contains 57,448 synthetic 3D shapes of 55 categories. We fine-tune Point-M2AE on two shape classification datasets: the widely adopted Model Net40 [44] and the challenging Scan Object NN [38]. We evaluate Point-M2AE for part segmentation on Shape Net Part [48]... we apply Point-M2AE to serving as the feature backbone on the indoor Scan Net V2 [9] dataset. |
| Dataset Splits | Yes | We pre-train our Point-M2AE on Shape Net [6] dataset... We test the 3D representation capability of Point M2AE via linear evaluation on Model Net40 [44]. We sample 1,024 points from each 3D shape of Model Net40... We fine-tune Point-M2AE on two shape classification datasets: the widely adopted Model Net40 [44] and the challenging Scan Object NN [38]. We follow Point-BERT to use the voting strategy [25] for fair comparison on Model Net40. (This implies using standard benchmark splits, further confirmed by the ethics statement: '3. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes]'). |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | Settings. We pre-train our Point-M2AE on Shape Net [6] dataset... We set the stage number S as 3, and construct a 3-stage encoder and a 2-stage decoder for hierarchical learning. We adopt 5 blocks in each encoder stage, but only 1 block per stage for the lightweight decoder. For the 3-scale point clouds, we set the point numbers and token dimensions respectively as {512, 256, 64} and {96, 192, 384}. We also set different k for the k-NN at different scales, which are {16, 8, 8}. We mask the highest scale of point clouds with a high ratio of 80% and set 6 heads for all the attention modules. The detailed training settings are in Appendix. |