Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

Authors: Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct our experiments on the nu Scenes dataset [3], a widely used benchmark for autonomous driving tasks. ... As presented in Tab. 1, the performance of VCD-A surpasses other cutting-edge methods, achieving a record of 44.6% and 56.6% on the nu Scenes benchmark. This provides robust evidence of the effectiveness of our approach. ... To verify the effectiveness and necessity of each component, we conduct various ablation experiments on the nu Scenes validation set.
Researcher Affiliation Collaboration 1Shanghai AI Lab 2Nanjing University 3CUHK 4Baidu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks with step-by-step instructions. Figure 2 is an 'Algorithm Overview' diagram, not pseudocode.
Open Source Code Yes The code will be released at https://github.com/Open Drive Lab/Birds-eye-view-Perception.
Open Datasets Yes We conduct our experiments on the nu Scenes dataset [3], a widely used benchmark for autonomous driving tasks. [3] Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nu Scenes: A multimodal dataset for autonomous driving. In CVPR, 2020.
Dataset Splits Yes The dataset comprises 700 training scenes, 150 validation scenes, and 150 testing scenes.
Hardware Specification Yes Main experiments are trained on 8 NVIDIA A100 GPUs, while ablation experiments are conducted on 8 NVIDIA V100 GPUS.
Software Dependencies No The paper mentions that 'The codebase is developed upon MMDetection3D [13]' but does not provide specific version numbers for MMDetection3D or any other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes For BEVDepth, the model is trained for 20 epochs with an initial learning rate of 2e-4. In the distillation process, the per-GPU batch size is set to 4, whereas during the training of the baseline model, it is set to 8.