3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining
Authors: Siming Yan, Yuqi Yang, Yu-Xiao Guo, Hao Pan, Peng-Shuai Wang, Xin Tong, Yang Liu, Qixing Huang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted a series of experiments and ablation studies to validate the efficacy and superiority of our masked 3D feature prediction approach, in short Mask Feat3D, for point cloud pretraining. |
| Researcher Affiliation | Collaboration | The University of Texas at Austin, Microsoft Research Asia Peking University {siming, huangqx}@cs.utexas.edu, {wangps}@hotmail.com {t-yuqyan, Yuxiao.Guo, haopan, yangliu, xtong}@microsoft.com |
| Pseudocode | No | The paper includes figures illustrating the network architecture and pretraining pipeline (e.g., Figure 2), but no explicitly labeled "Pseudocode" or "Algorithm" blocks, nor any structured code-like text. |
| Open Source Code | Yes | The code is available at https://github.com/Siming Yan/Mask Feat3D. |
| Open Datasets | Yes | We choose Shape Net (Chang et al., 2015) dataset for our pretraining, following the practice of Point BERT (Yu et al., 2022) and previous 3D MAE-based approaches (Pang et al., 2022; Zhang et al., 2022; Liu et al., 2022). |
| Dataset Splits | Yes | Model Net40 is a widely used synthetic dataset that comprises 40 classes and contains 9832 training objects and 2468 test objects. |
| Hardware Specification | Yes | All models were trained with 300 epochs on eight 16 GB Nvidia V100 GPUs. |
| Software Dependencies | No | The paper states "We implemented all pretraining models in Py Torch and used Adam W optimizer," but it does not provide specific version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | We set K = 128 in FPS, k = 32-nearest points to form the point patch, and the best masking ratio is 60% empirically. The number of transformer blocks in the decoder is 4. The learning rates of the encoder and the decoder are set to 10 3 and 10 4, respectively. Standard data augmentation such as rotation, scaling, and translation are employed. All models were trained with 300 epochs on eight 16 GB Nvidia V100 GPUs. The total batch size is 64. |