reproducibilityindex.ai

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

Authors: Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct our experiments on the nu Scenes dataset [3], a widely used benchmark for autonomous driving tasks. ... As presented in Tab. 1, the performance of VCD-A surpasses other cutting-edge methods, achieving a record of 44.6% and 56.6% on the nu Scenes benchmark. This provides robust evidence of the effectiveness of our approach. ... To verify the effectiveness and necessity of each component, we conduct various ablation experiments on the nu Scenes validation set.
Researcher Affiliation	Collaboration	1Shanghai AI Lab 2Nanjing University 3CUHK 4Baidu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks with step-by-step instructions. Figure 2 is an 'Algorithm Overview' diagram, not pseudocode.
Open Source Code	Yes	The code will be released at https://github.com/Open Drive Lab/Birds-eye-view-Perception.
Open Datasets	Yes	We conduct our experiments on the nu Scenes dataset [3], a widely used benchmark for autonomous driving tasks. [3] Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nu Scenes: A multimodal dataset for autonomous driving. In CVPR, 2020.
Dataset Splits	Yes	The dataset comprises 700 training scenes, 150 validation scenes, and 150 testing scenes.
Hardware Specification	Yes	Main experiments are trained on 8 NVIDIA A100 GPUs, while ablation experiments are conducted on 8 NVIDIA V100 GPUS.
Software Dependencies	No	The paper mentions that 'The codebase is developed upon MMDetection3D [13]' but does not provide specific version numbers for MMDetection3D or any other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	For BEVDepth, the model is trained for 20 epochs with an initial learning rate of 2e-4. In the distillation process, the per-GPU batch size is set to 4, whereas during the training of the baseline model, it is set to 8.