LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling

Authors: Yaohua Zha, Naiqi Li, Yanzi Wang, Tao Dai, Hang Guo, Bin Chen, Zhi Wang, Zhihao Ouyang, Shu-Tao Xia

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results show that our compact model significantly surpasses existing Transformer-based models in both performance and efficiency, especially our LCMbased Point-MAE model, compared to the Transformer-based model, achieved an improvement of 1.84%, 0.67%, and 0.60% in average accuracy on the three variants of Scan Object NN while reducing parameters by 88% and computation by 73%.
Researcher Affiliation Collaboration 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Institute of Visual Intelligence, Pengcheng Laboratory 3College of Computer Science and Software Engineering, Shenzhen University 4Harbin Institute of Technology, Shenzhen 5Bytedance Inc.
Pseudocode No The paper describes methods through figures and textual explanations but does not include formal pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/zyh16143998882/LCM.
Open Datasets Yes We pre-training our LCM using five different pretraining strategies: Point-BERT [60], Mask Point [28], Point-MAE [37], Point-M2AE [65], and ACT [8]. For a fire comparison, we use Shape Net [3] as our pre-training dataset, encompassing over 50,000 distinct 3D models spanning 55 prevalent object categories. We initially assess the overall classification accuracy of our pre-trained models on both real-scanned (Scan Object NN [44]) and synthetic (Model Net40 [57]) datasets. We further assess the object detection performance of our pre-trained model on the more challenging scene-level point cloud dataset, Scan Net V2 [6]
Dataset Splits No The paper mentions training and testing but does not explicitly provide the specific train/validation/test dataset splits (percentages or counts) or reference predefined splits with sufficient detail for reproduction in the main text.
Hardware Specification Yes Due to the surprisingly lightweight and efficient of our LCM model, we were able to complete the pre-training tasks using just a single 24GB NVIDIA Ge Force RTX 3090 GPU. For downstream classification and segmentation tasks, we used a single RTX 3090 GPU for each. For detection tasks, to accelerate training, we utilized four parallel RTX 3090 GPUs.
Software Dependencies No The paper does not explicitly state specific software dependencies with version numbers (e.g., Python, PyTorch, or CUDA versions) required for replication.
Experiment Setup No The paper states that hyperparameter settings were adopted from previous methods (e.g., 'we used the same settings as previous methods'), and mentions some data augmentation and baseline replacement, but it does not explicitly list specific hyperparameter values (e.g., learning rate, batch size, epochs) or detailed training schedules within the main text.