CALICO: Self-Supervised Camera-LiDAR Contrastive Pre-training for BEV Perception

Authors: Jiachen Sun, Haizhong Zheng, Qingzhao Zhang, Atul Prakash, Zhuoqing Mao, Chaowei Xiao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental CALICO’s efficacy is substantiated by extensive evaluations on 3D object detection and BEV map segmentation tasks, where it delivers significant performance improvements. Notably, CALICO outperforms the baseline method by 10.5% and 8.6% on NDS and m AP. and 3 EXPERIMENTS AND RESULTS In this section, we introduce the evaluation of CALICO with a breakdown of contributions from different components. We first describe the experimental setup in 3.1 and detail the evaluation results in 3.2 and 3.3. Lastly, we conduct a comprehensive analysis and ablation studies of CALICO in 3.4 and 3.5.
Researcher Affiliation Collaboration Jiachen Sun 1, Haizhong Zheng 1, Qingzhao Zhang 1, Atul Prakash 1, Z. Morley Mao 1, and Chaowei Xiao 2,3 1 University of Michigan 2 University of Wisconsin, Madison 3 NVIDIA
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We utilize the nu Scenes dataset (Caesar et al., 2020) to evaluate our method. We additionally leverage Waymo (Sun et al., 2020b) datasets to demonstrate the generalizability and transferability of our CALICO.
Dataset Splits Yes We finetune our CALICO pretrained model using 20% of labeled examples from the training set using 30 epochs and evaluate the performance on the validation set. We leverage {5%, 10%, 20%, 50%} of the training set with annotations to further finetune the model with detection or segmentation head attached for another 20 epochs.
Hardware Specification Yes All experiments are conducted on 4 V100 GPUs with 32GB memory (v10, 2023). V100 GPU. https://www.nvidia.com/en-us/data-center/v100/, 2023.
Software Dependencies No The paper mentions software components like 'Adam W optimizer' but does not provide specific version numbers for any libraries, frameworks, or programming languages used in the experiments.
Experiment Setup Yes The DBSCAN in our semantic pooling has a minimum of 5 points and a distance of 0.75 meters for clustering. ... The output dimension of the projectors is set as 128. ... The temperature factor in Equations 1 and 2 are set τ = 0.07. During pretraining, we sample N = 1024 semantic-rich and M = 1024 semantic-less points and set α = 0.5. We pretrain the f Li DAR and f Camera using PRC and RAD both for 20 epochs on the entire training set. ... We employed the Adam W optimizer with a cyclic scheduler and a starting learning rate of 2 * 10^-4. A gradient maximum clip of 35 was used.