Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training
Authors: Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzhi Li, Pheng-Ann Heng
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the performance of Joint-MAE on various downstream tasks, i.e., shape classification, few-shot classification, and part segmentation. Joint-MAE achieves superior performance on multiple downstream tasks, e.g., 92.4% accuracy for linear SVM on Model Net40 and 86.07% accuracy on the hardest split of Scan Object NN. |
| Researcher Affiliation | Academia | 1 Department of Computer Science and Engineering, The Chinese University of Hong Kong 2 CUHK MMLab 3Huazhong University of Science and Technology 4Institute of Medical Intelligence and XR, The Chinese University of Hong Kong 5Shanghai Tech University |
| Pseudocode | No | The paper describes its method using textual descriptions and architectural diagrams (e.g., Figure 2) but does not provide any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link regarding the public availability of its source code. |
| Open Datasets | Yes | Following existing works [Pang et al., 2022; Zhang et al., 2022a], we pre-train our Joint-MAE on Shape Net [Chang et al., 2015], which covers 57,448 3D shapes of 55 categories. We utilize a simple a classification head of linear layers and evaluate the accuracy on Model Net40 [Wu et al., 2015a] and Scan Object NN [Uy et al., 2019] datasets, which contain synthetic objects and real-world instances, respectively. |
| Dataset Splits | Yes | we pre-train our Joint-MAE on Shape Net [Chang et al., 2015]... we train on 9,843 instances and test on 2,468 instances with 40 categories. Scan Object NN dataset, which consists of 2,304 ob-jects for training and 576 objects for testing. |
| Hardware Specification | No | The paper does not specify any particular hardware components (e.g., GPU models, CPU types, or memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or library versions used for implementation or experimentation. |
| Experiment Setup | Yes | The input point number N is set as 2,048 and the depth map size H W is set as 224 224. We adopt a feature dimension C as 384. Please refer to the Supplementary Material for detailed implementation details. |