Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models
Authors: Zhimin Chen, Longlong Jing, Yingwei Li, Bing Li
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on multiple datasets, including SUN RGB-D [75] and Scan Net [14] for 3D object detection and S3DIS [5] for 3D semantic segmentation. Our approach outperforms state-of-the-art self-supervised learning methods in both tasks, demonstrating the effectiveness of our proposed framework. |
| Researcher Affiliation | Academia | Zhimin Chen1 Clemson University zhiminc@clemson.edu Longlong Jing2 The City University of New York ljing@gradcenter.cuny.edu Yingwei Li3 Johns Hopkins University yingwei.li@jhu.edu Bing Li B1 Clemson University bli4@clemson.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks are present in the paper. |
| Open Source Code | Yes | Code will be available at: https://github.com/Zhimin-C/Bridge3D |
| Open Datasets | Yes | We evaluate our method on multiple datasets, including SUN RGB-D [75] and Scan Net [14] for 3D object detection and S3DIS [5] for 3D semantic segmentation. |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions models and optimizers used (e.g., 'Point MAE', 'DINOV2', 'CLIP Vi T-B', 'Tag2text', 'Adam W') but does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | Pre-training. During this stage, we perform training of the model for 120 epochs by employing the Scan Net dataset [14]... We use Adam W [45] optimizer with a base learning rate of 5e-4 and weight decay of 5e-2, along with a batch size of 64. The whole masking ratio rw is set to 70% and the drop ratio rd is set to 40%. The cosine learning rate scheduler is applied, with a drop path rate and warm-up epochs set to 0.1 and 10, respectively. The encoder depth is set to 6, and we utilize the same decoder as Point-MAE [50], with the decoder depth set to 2. |