BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection

Authors: Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinhong Jiang, Feng Zhao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the proposed method outperforms current KD approaches on a highly-competitive baseline, BEVFormer, without introducing any extra cost in the inference phase. Notably, our best model achieves 59.4 NDS on the nu Scenes test leaderboard, achieving new state-of-the-arts in comparison with various image-based detectors.
Researcher Affiliation Collaboration Zehui Chen1, Zhenyu Li2, Shiquan Zhang3, Liangji Fang3, Qinhong Jiang3, Feng Zhao1 1 University of Science and Technology of China 2 Harbin Institute of Technology 3 Sense Time Research lovesnow@mail.ustc.edu.cn, fzhao956@ustc.edu.cn zhenyuli17@hit.edu.cn {zhangshiquan,fangliangji,jiangqinhong}@senseauto.com
Pseudocode No The paper does not contain a pseudocode or algorithm block.
Open Source Code Yes Code will be available at https://github.com/zehuichen123/BEVDistill.
Open Datasets Yes We conduct the experiments on the Nu Scenes dataset (Caesar et al., 2020), which is one of the most popular datasets for 3D object detection.
Dataset Splits Yes It consists of 700 scenes for training, 150 scenes for validation, and 150 scenes for testing.
Hardware Specification Yes All models are trained on 8 NVIDIA A100 GPUs.
Software Dependencies No The paper states 'Our codebase is built on MMDetection3D (Contributors, 2020) toolkit.' but does not provide specific version numbers for software dependencies such as PyTorch, CUDA, or the MMDetection3D toolkit itself.
Experiment Setup Yes During the distillation phase, the batch size is set to 1 per GPU with an initial learning rate of 2e-4. Unless otherwise specified, we train the models for 2 schedule (24 epochs) with a cyclic policy. The input image size is set to 1600 900 and the grid size of the BEV plane in BEVFormer is set to 128 128.