M-BEV: Masked BEV Perception for Robust Autonomous Driving

Authors: Siran Chen, Yue Ma, Yu Qiao, Yali Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experiments on the popular Nu Scenes benchmark, where our framework can significantly boost 3D perception performance of the state-of-the-art models on various missing view cases, e.g., for the absence of back view, our M-BEV promotes the PETRv2 model with 10.3% m AP gain.
Researcher Affiliation Academia 1 Shenzhen Institute of Advanced Technology, Chinese Academy of Science, Shenzhen, China 2 School of Artificial Intelligence, University of Chinese Academy of Science, Beijing, China 3 Shanghai Artificial Intelligence Laboratory, Shanghai, China 4 Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions following 'official implementation on open-sourced code bases' for baseline models (PETRv2, BEVStereo) but does not state that the code for their proposed M-BEV framework is open-source or provide a link.
Open Datasets Yes We conduct our experiments on the popular Nu Scenes dataset (Caesar et al. 2020). Nu Scenes is a large-scale benchmark for autonomous driving, where the data is collected from 1000 real driving scenes with around 20 seconds duration. The scenes are divided: 700 of them for training, and 150 each for validation and testing.
Dataset Splits Yes The scenes are divided: 700 of them for training, and 150 each for validation and testing.
Hardware Specification Yes We use 8 A5000 GPUs for all experiments.
Software Dependencies No The paper mentions using 'open-sourced code bases' and specific models (PETRv2, BEVStereo) but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes The MVR module is fine-tuned for 48 epochs, the learning rate is set to 2.0 10 4. The transformer layer of decoder is four, and the hidden dimension is 512. [...] we set a weight coefficient α = 0.05 for the reconstruction loss in the fine-tuning.