MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding
Authors: Hai-Tao Yu, Mofei Song
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | MM-Point demonstrates stateof-the-art (SOTA) performance in various downstream tasks. For instance, it achieves a peak accuracy of 92.4% on the synthetic dataset Model Net40, and a top accuracy of 87.8% on the real-world dataset Scan Object NN, comparable to fully supervised methods. Additionally, we demonstrate its effectiveness in tasks such as few-shot classification, 3D part segmentation and 3D semantic segmentation. In this section, we first introduce the pre-training details of MM-Point. As our focus is on 3D representation learning, we only evaluate the pre-trained 3D point cloud encoder backbones. We sample different downstream tasks and assess the 3D feature representations learned by MM-Point. Ablation Study In order to investigate the contributions of each main component in MM-Point, we conduct an extensive ablation study. |
| Researcher Affiliation | Academia | 1 School of Cyber Science and Engineering, Southeast University, Nanjing, China 2 School of Computer Science and Engineering, Southeast University, Nanjing, China 3 Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China |
| Pseudocode | No | The paper describes the methods in prose and with schematic figures, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Codes are available at https://github.com/Hayden Yu/MM-Point. |
| Open Datasets | Yes | Datasets Shape Net (Chang et al. 2015) is a large-scale 3D shape dataset containing 51162 synthetic 3D point cloud objects. Model Net40 (Wu et al. 2015) is a synthetic point cloud dataset obtained by sampling 3D CAD models, consisting of 12311 3D objects. Model Net10 (Qi et al. 2016) includes 4899 CAD models with orientations from 10 categories. Scan Object NN (Uy et al. 2019) is a real-world 3D object dataset comprising 2880 unique point cloud objects. |
| Dataset Splits | No | For evaluation, we train our model from scratch on Areas 1 4 and Area 6, using Area 5 for validation. To evaluate the effectiveness of the point cloud representation learned by MM-Point, we first performed random sampling of 1024 points for each object. The paper does not specify explicit train/validation/test splits (e.g., 80/10/10) for ModelNet or Scan Object NN datasets, nor does it cite predefined splits for them in the context of the main model training. |
| Hardware Specification | No | The paper mentions 'We employ DGCNN (Wang et al. 2019) as the 3D backbone. For the image modality, we use Res Net-50 as the 2D backbone. Pretraining employs Adam W as the optimizer.' and 'This research work was also supported by the Big Data Computing Center of Southeast University'. However, no specific hardware details like GPU/CPU models or memory are provided. |
| Software Dependencies | No | The paper mentions 'We employ DGCNN (Wang et al. 2019) as the 3D backbone. For the image modality, we use Res Net-50 as the 2D backbone. Pretraining employs Adam W as the optimizer.' No version numbers for any software, libraries, or frameworks are specified. |
| Experiment Setup | Yes | Implementation Details For the point cloud modality, we employ DGCNN (Wang et al. 2019) as the 3D backbone. For the image modality, we use Res Net-50 as the 2D backbone. For all encoders, we append a 2-layer non-linear MLP projection head to generate the final representation. Note that we add different projection heads to obtain features. Pretraining employs Adam W as the optimizer. To evaluate the effectiveness of the point cloud representation learned by MM-Point, we first performed random sampling of 1024 points for each object. For evaluation, we train our model from scratch on Areas 1 4 and Area 6, using Area 5 for validation. |