Multi-View Representation is What You Need for Point-Cloud Pre-Training
Authors: Siming Yan, Chen Song, Youkang Kong, Qixing Huang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our pre-trained model can be successfully transferred to various downstream tasks, including 3D shape classification, part segmentation, 3D object detection, and semantic segmentation, achieving state-of-the-art performance. |
| Researcher Affiliation | Collaboration | The University of Texas at Austin, Microsoft Research Asia {siming, song, huangqx}@cs.utexas.edu, {kykdqs}@gmail.com |
| Pseudocode | No | The paper describes its approach using descriptive text and figures, but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include an explicit statement about making the source code available or provide a link to a code repository for the described methodology. |
| Open Datasets | Yes | Data preparation We choose Scan Net (Dai et al., 2017) as the pre-training dataset, which contains approximately 2.5M RGB-D scans from 1,513 indoor scenes. Following (Qi et al., 2019), we downsample 190K RGB-D scans from 1,200 video sequences in the training set. We also pre-trained our model on Objaverse (Deitke et al., 2023), which contains approximately 800K real-world 3D objects. We also pre-trained our model on Shape Net (Chang et al., 2015) for fair comparison with previous approaches. |
| Dataset Splits | Yes | Semantic segmentation We use the original training and validation splits of Scan Net (Dai et al., 2017) and report the mean Intersection over Union (Io U) on the validation split. We also evaluate on S3DIS (Armeni et al., 2017) that has six large areas, where one area is chosen as the validation set, and the remaining areas are utilized for training. 3D object detection Scan Net (Dai et al., 2017) contains instance labels for 18 object categories, with 1,201 scans for training and 312 for validation. |
| Hardware Specification | Yes | The model is trained for 200 epochs on eight 32GB Nvidia V100 GPUs, with 64 as the batch size. |
| Software Dependencies | No | The paper mentions 'Py Torch' as the implementation framework and 'Adam W optimizer' and 'SGD+momentum optimizer' but does not specify exact version numbers for these or any other software components. |
| Experiment Setup | Yes | Network training Our pre-training model is implemented using Py Torch, employing the Adam W optimizer (Loshchilov & Hutter, 2017) with a weight decay of 10 4. The learning rate is set to 10 3. The model is trained for 200 epochs on eight 32GB Nvidia V100 GPUs, with 64 as the batch size. We fine-tune our model using the SGD+momentum optimizer with a batch size of 48, 10,000 iterations, an initial learning rate of 0.01, and a polynomial-based learning rate scheduler with a power factor of 0.9. |