Self-Supervised Pretraining for Large-Scale Point Clouds
Authors: Zaiwei Zhang, Min Bai, Erran Li Li
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose a new selfsupervised pretraining method that targets large-scale 3D scenes. We pretrain commonly used point-based and voxel-based model architectures and show the transfer learning performance on 3D object detection and semantic segmentation. We demonstrate the effectiveness of our approach on both dense 3D indoor point clouds and sparse outdoor Li DAR point clouds. |
| Researcher Affiliation | Industry | Zaiwei Zhang AWS AI Santa Clara, CA 95054 zaiweiz@amazon.com Min Bai AWS AI Santa Clara, CA 95054 baimin@amazon.com Erran Li AWS AI Santa Clara, CA 95054 lilimam@amazon.com |
| Pseudocode | Yes | Algorithm 1 Training Framework for SSPL |
| Open Source Code | No | We provide the detailed information about our model architecture in the supplementary material. Our codes are still under internal review and we will release the codes as soon as it s approved internally. |
| Open Datasets | Yes | In our work, we choose the Scan Net [14] dataset for pretraining. Scan Net contains 1500 large scale indoor scenes... We use two large scale, well-established public datasets for this task, which contain a large variety of scenes. Semantic KITTI (SK) [5] is based on the original KITTI [18] dataset... The more recent Waymo Open Dataset (WOD) [57] provides a further leap in the scale and variability of data... |
| Dataset Splits | Yes | To study the label efficiency of our pretrained models, we also subsample different sets of the training data used for finetuning. We follow the setup in Vote Net [45] for finetuning. Based on the results in Table 1, our approach significantly improves downstream object detection performance across all settings, with up to 4.4% improvement on Scan Net and 2.7% on SUN RGBD. Moreover, on Scan Net, our pretraining approach together with 20% of labeled data allows the downstream detection task to achieve the equivalent performance of using 50% of labeled data and training from scratch. |
| Hardware Specification | No | The paper states, 'We have provided this information in the supplementary material,' under question 3d, but no specific hardware details are present within the main body of the paper. |
| Software Dependencies | No | The paper mentions deep learning frameworks and models (e.g., Point Net++, 3D U-Net, Vote Net) but does not provide specific software version numbers for libraries or environments used for implementation. |
| Experiment Setup | Yes | In our experiments, we set the size of global feature queue to be 300K. We use a temperature value of 0.1 while computing the non-parametric softmax in Eq 1 and 3. The local and global contrastive loss are equally weighted. For training, we use a standard SGD optimizer with momentum 0.9, and we use a cosine learning rate scheduler [38] which decreases from 0.06 to 0.00006 and train the model for 500 epochs with a batch size of 96. |