Multi-View Representation is What You Need for Point-Cloud Pre-Training

Authors: Siming Yan, Chen Song, Youkang Kong, Qixing Huang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our pre-trained model can be successfully transferred to various downstream tasks, including 3D shape classification, part segmentation, 3D object detection, and semantic segmentation, achieving state-of-the-art performance.
Researcher Affiliation Collaboration The University of Texas at Austin, Microsoft Research Asia {siming, song, huangqx}@cs.utexas.edu, {kykdqs}@gmail.com
Pseudocode No The paper describes its approach using descriptive text and figures, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include an explicit statement about making the source code available or provide a link to a code repository for the described methodology.
Open Datasets Yes Data preparation We choose Scan Net (Dai et al., 2017) as the pre-training dataset, which contains approximately 2.5M RGB-D scans from 1,513 indoor scenes. Following (Qi et al., 2019), we downsample 190K RGB-D scans from 1,200 video sequences in the training set. We also pre-trained our model on Objaverse (Deitke et al., 2023), which contains approximately 800K real-world 3D objects. We also pre-trained our model on Shape Net (Chang et al., 2015) for fair comparison with previous approaches.
Dataset Splits Yes Semantic segmentation We use the original training and validation splits of Scan Net (Dai et al., 2017) and report the mean Intersection over Union (Io U) on the validation split. We also evaluate on S3DIS (Armeni et al., 2017) that has six large areas, where one area is chosen as the validation set, and the remaining areas are utilized for training. 3D object detection Scan Net (Dai et al., 2017) contains instance labels for 18 object categories, with 1,201 scans for training and 312 for validation.
Hardware Specification Yes The model is trained for 200 epochs on eight 32GB Nvidia V100 GPUs, with 64 as the batch size.
Software Dependencies No The paper mentions 'Py Torch' as the implementation framework and 'Adam W optimizer' and 'SGD+momentum optimizer' but does not specify exact version numbers for these or any other software components.
Experiment Setup Yes Network training Our pre-training model is implemented using Py Torch, employing the Adam W optimizer (Loshchilov & Hutter, 2017) with a weight decay of 10 4. The learning rate is set to 10 3. The model is trained for 200 epochs on eight 32GB Nvidia V100 GPUs, with 64 as the batch size. We fine-tune our model using the SGD+momentum optimizer with a batch size of 48, 10,000 iterations, an initial learning rate of 0.01, and a polynomial-based learning rate scheduler with a power factor of 0.9.