Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models

Authors: Yifan Zhang, Junhui Hou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our approach mitigates the challenges posed by traditional methods and consistently surpasses existing image-to-Li DAR contrastive distillation methods in downstream tasks. Code is available at https://github.com/Eaphan/OLIVINE.
Researcher Affiliation Academia Yifan Zhang and Junhui Hou Department of Computer Science, City University of Hong Kong yzhang3362-c@my.cityu.edu.hk;jh.hou@cityu.edu.hk
Pseudocode No The paper describes the proposed method in detail in Section 3 and its subsections, outlining processes such as 'weakly-supervised contrastive distillation' and 'density and category-aware sampling strategy.' While these descriptions explain the procedural steps of the method, they are presented in paragraph form and do not include structured pseudocode blocks or figures explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Code is available at https://github.com/Eaphan/OLIVINE.
Open Datasets Yes Nu Scenes Dataset. The Nu Scenes dataset, compiled from driving recordings in Boston and Singapore, utilizes a vehicle equipped with a 32-beam Li DAR and additional sensing technologies [6]. ... Semantic KITTI Dataset. The Semantic KITTI (SK) dataset features paired RGB images and point cloud data derived from KITTI s urban scenes, specifically designed for semantic segmentation tasks [2].
Dataset Splits Yes The nu Scenes-lidarseg and Semantic KITTI datasets contain 16 and 19 semantic categories for validation, respectively. ... In line with standard practices, the dataset is divided into training and validation sets, with 10 sequences designated for training and the eighth sequence reserved for validation.
Hardware Specification Yes The 3D network is pre-trained for these 50 epochs on four NVIDIA-3090 GPUs, processing a total batch size of 16, unless specified otherwise.
Software Dependencies No The paper lists software components such as 'Minkowski Engine,' 'Open PCDet,' and 'Py Torch-Lightning' in Section A.8 'Public Resources Used.' However, it does not specify explicit version numbers for these software dependencies (e.g., 'PyTorch 1.9' or 'Open PCDet v0.5'), which are necessary for reproducible experimental setups.
Experiment Setup Yes We utilize momentum SGD for optimization, setting the initial learning rate at 0.5 and 0.01 for SR-UNet34 and Voxel Net respectively, with a momentum of 0.9 and a weight decay of 1e-4. To adjust the learning rate, we employ a cosine annealing scheduler [5] that gradually reduces it from the initial value to 0 over 50 epochs. The 3D network is pre-trained for these 50 epochs on four NVIDIA-3090 GPUs, processing a total batch size of 16, unless specified otherwise.