reproducibilityindex.ai

Multi-View Representation is What You Need for Point-Cloud Pre-Training

Authors: Siming Yan, Chen Song, Youkang Kong, Qixing Huang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our pre-trained model can be successfully transferred to various downstream tasks, including 3D shape classification, part segmentation, 3D object detection, and semantic segmentation, achieving state-of-the-art performance.
Researcher Affiliation	Collaboration	The University of Texas at Austin, Microsoft Research Asia {siming, song, huangqx}@cs.utexas.edu, {kykdqs}@gmail.com
Pseudocode	No	The paper describes its approach using descriptive text and figures, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include an explicit statement about making the source code available or provide a link to a code repository for the described methodology.
Open Datasets	Yes	Data preparation We choose Scan Net (Dai et al., 2017) as the pre-training dataset, which contains approximately 2.5M RGB-D scans from 1,513 indoor scenes. Following (Qi et al., 2019), we downsample 190K RGB-D scans from 1,200 video sequences in the training set. We also pre-trained our model on Objaverse (Deitke et al., 2023), which contains approximately 800K real-world 3D objects. We also pre-trained our model on Shape Net (Chang et al., 2015) for fair comparison with previous approaches.
Dataset Splits	Yes	Semantic segmentation We use the original training and validation splits of Scan Net (Dai et al., 2017) and report the mean Intersection over Union (Io U) on the validation split. We also evaluate on S3DIS (Armeni et al., 2017) that has six large areas, where one area is chosen as the validation set, and the remaining areas are utilized for training. 3D object detection Scan Net (Dai et al., 2017) contains instance labels for 18 object categories, with 1,201 scans for training and 312 for validation.
Hardware Specification	Yes	The model is trained for 200 epochs on eight 32GB Nvidia V100 GPUs, with 64 as the batch size.
Software Dependencies	No	The paper mentions 'Py Torch' as the implementation framework and 'Adam W optimizer' and 'SGD+momentum optimizer' but does not specify exact version numbers for these or any other software components.
Experiment Setup	Yes	Network training Our pre-training model is implemented using Py Torch, employing the Adam W optimizer (Loshchilov & Hutter, 2017) with a weight decay of 10 4. The learning rate is set to 10 3. The model is trained for 200 epochs on eight 32GB Nvidia V100 GPUs, with 64 as the batch size. We fine-tune our model using the SGD+momentum optimizer with a batch size of 48, 10,000 iterations, an initial learning rate of 0.01, and a polynomial-based learning rate scheduler with a power factor of 0.9.