Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
Authors: Youquan Liu, Lingdong Kong, Jun CEN, Runnan Chen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted on eleven different point cloud datasets showcase the effectiveness and superiority of Seal. Notably, Seal achieves a remarkable 45.0% m Io U on nu Scenes after linear probing, surpassing random initialization by 36.9% m Io U and outperforming prior arts by 6.1% m Io U. Moreover, Seal demonstrates significant performance gains over existing methods across 20 different few-shot fine-tuning tasks on all eleven tested point cloud datasets. |
| Researcher Affiliation | Academia | 1Shanghai AI Laboratory 2National University of Singapore 3The Hong Kong University of Science and Technology 4The University of Hong Kong 5S-Lab, Nanyang Technological University |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | The code is available at this link2. 2Git Hub Repo: https://github.com/youquanl/Segment-Any-Point-Cloud. |
| Open Datasets | Yes | We verify the effectiveness of our approach on eleven different point cloud datasets. 1nu Scenes [7, 26], 2Semantic KITTI [3], and 3Waymo Open [88] contain large-scale Li DAR scans collected from real-world driving scenes; More details of this dataset can be found at https://www.nuscenes.org/nuscenes. |
| Dataset Splits | Yes | For model pertaining, we follow the SLid R protocol [85] in data splitting. Specifically, the nu Scenes [26] dataset consists of 700 training scenes in total, 100 of which are kept aside, which constitute the SLid R mini-val split. All models are pretrained using all the scans from the 600 remaining training scenes. The 100 scans in the mini-val split are used to find the best possible hyperparameters. The trained models are then validated on the official nu Scenes validation set, without any kind of test-time augmentation or model ensemble. This is to ensure a fair comparison with previous works and also in line with the practical requirements. For fine-tuning on nu Scenes [26], we follow the SLid R protocol to split the train set of nu Scenes to generate 1%, 5%, 10%, 25%, and 100% annotated scans for the training subset. |
| Hardware Specification | Yes | In our experiments, we fine-tune the entire 3D network on the semantic segmentation task using a linear combination of the cross-entropy loss and the Lovász-Softmax loss [4] as training objectives on a single A100 GPU. |
| Software Dependencies | No | The paper mentions various software components such as Mink UNet, ResNet-50, MoCo V2, PyTorch-Lightning, and mmdetection3d, but it does not specify concrete version numbers for these software dependencies, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | For the few-shot semantic segmentation tasks, the 3D networks are fine-tuned for 100 epochs with a batch size of 10 for the Semantic KITTI [3], Waymo Open [88], Scribble KITTI [95], RELLIS-3D [49], Semantic STF [102], Semantic POSS [75], DAPS-3D [51], Syn Li DAR [100], and Synth4D [82] datasets. For the nu Scenes [26] dataset, we fine-tune the 3D network for 100 epochs with a batch size of 16 while training on the 1% annotated scans. The 3D network train on the other portions of nu Scenes is fine-tuned for 50 epochs with a batch size of 16. We adopt different learning rates on the 3D backbone Fθp and the classification head, except for the case that Fθp is randomly initialized. The learning rate of Fθp is set as 0.05 and the learning rate of the classification head is set as 2.0, respectively, for all the above-mentioned datasets except nu Scenes. On the nu Scenes dataset, the learning rate of Fθp is set as 0.02. We train our framework using the SGD optimizer with a momentum of 0.9, a weight decay of 0.0001, and a dampening ratio of 0.1. The cosine annealing learning rate strategy is adopted which decreases the learning rate from its initial value to zero at the end of the training. |