3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining

Authors: Siming Yan, Yuqi Yang, Yu-Xiao Guo, Hao Pan, Peng-Shuai Wang, Xin Tong, Yang Liu, Qixing Huang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted a series of experiments and ablation studies to validate the efficacy and superiority of our masked 3D feature prediction approach, in short Mask Feat3D, for point cloud pretraining.
Researcher Affiliation Collaboration The University of Texas at Austin, Microsoft Research Asia Peking University {siming, huangqx}@cs.utexas.edu, {wangps}@hotmail.com {t-yuqyan, Yuxiao.Guo, haopan, yangliu, xtong}@microsoft.com
Pseudocode No The paper includes figures illustrating the network architecture and pretraining pipeline (e.g., Figure 2), but no explicitly labeled "Pseudocode" or "Algorithm" blocks, nor any structured code-like text.
Open Source Code Yes The code is available at https://github.com/Siming Yan/Mask Feat3D.
Open Datasets Yes We choose Shape Net (Chang et al., 2015) dataset for our pretraining, following the practice of Point BERT (Yu et al., 2022) and previous 3D MAE-based approaches (Pang et al., 2022; Zhang et al., 2022; Liu et al., 2022).
Dataset Splits Yes Model Net40 is a widely used synthetic dataset that comprises 40 classes and contains 9832 training objects and 2468 test objects.
Hardware Specification Yes All models were trained with 300 epochs on eight 16 GB Nvidia V100 GPUs.
Software Dependencies No The paper states "We implemented all pretraining models in Py Torch and used Adam W optimizer," but it does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup Yes We set K = 128 in FPS, k = 32-nearest points to form the point patch, and the best masking ratio is 60% empirically. The number of transformer blocks in the decoder is 4. The learning rates of the encoder and the decoder are set to 10 3 and 10 4, respectively. Standard data augmentation such as rotation, scaling, and translation are employed. All models were trained with 300 epochs on eight 16 GB Nvidia V100 GPUs. The total batch size is 64.