Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving

Authors: Junkai Xu, Liang Peng, Haoran Cheng, Linxuan Xia, Qi Zhou, Dan Deng, Wei Qian, Wenxiao Wang, Deng Cai

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Occ3D and nu Scenes datasets demonstrate that Vampire facilitates fine-grained and appropriate extraction of dense 3D features, and is competitive with existing SOTA methods across diverse downstream perception tasks like 3D occupancy prediction, Li DAR segmentation and 3D objection detection, while utilizing moderate GPU resources.
Researcher Affiliation Collaboration Junkai Xu1,2*, Liang Peng1,2,*, Haoran Cheng1,2,*, Linxuan Xia1,2,*, Qi Zhou1,2,*, Dan Deng2, Wei Qian2, Wenxiao Wang3 , Deng Cai1,2 1State Key Lab of CAD & CG, Zhejiang University 2FABU Inc. 3School of Software Technology, Zhejiang University
Pseudocode No The paper describes its method in detail using text, figures, and mathematical equations, but it does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code Yes We provide a video demonstration in the supplementary materials and Codes are available at github.com/cskkxjk/Vampire.
Open Datasets Yes Experimental results on the Occ3D and nu Scenes datasets demonstrate that Vampire facilitates fine-grained and appropriate extraction of dense 3D features, and is competitive with existing SOTA methods across diverse downstream perception tasks like 3D occupancy prediction, Li DAR segmentation and 3D objection detection, while utilizing moderate GPU resources. Datasets. The nu Scenes dataset contains 1000 scenes of 20 seconds duration each, and the key samples are annotated at 2Hz. Each sample consists of RGB images from 6 surrounding cameras with 360 horizontal FOV and point cloud data from 32 beams Li DAR. The total of 1000 scenes are officially divided into training, validation and test splits with 700, 150 and 150 scenes, respectively. Occ3Dnu Scenes (Tian et al. 2023) contains 700 traing scenes and 150 validation scenes.
Dataset Splits Yes The nu Scenes dataset contains 1000 scenes of 20 seconds duration each, and the key samples are annotated at 2Hz. Each sample consists of RGB images from 6 surrounding cameras with 360 horizontal FOV and point cloud data from 32 beams Li DAR. The total of 1000 scenes are officially divided into training, validation and test splits with 700, 150 and 150 scenes, respectively. Occ3Dnu Scenes (Tian et al. 2023) contains 700 traing scenes and 150 validation scenes.
Hardware Specification Yes All models are trained for 24 epochs with a total batch size of 8 on 8 3080Ti GPUs (12GB).
Software Dependencies No The paper mentions using ResNet-50 as an image backbone and basing implementation on BEVDepth, but it does not specify software versions for these or other dependencies (e.g., Python, PyTorch versions).
Experiment Setup Yes Our implementation is based on official repository of BEVDepth (Li et al. 2022b). We use Res Net-50 (He et al. 2016) as image backbone and the image resolution of 256 704 to meet our computational resources. For the inpainting network, we adpot an hourglasslike architecture (further details are provided in our supplementary materials). The intermediate 3D feature resolutions are 20 256 256 corresponding to the range of [ 3.0, 5.0] [ 51.2, 51.2] [ 51.2, 51.2](meter) and the 3D feature dimension are set to 16 by default. We use Adam W as an optimizer with a learning rate set to 2e-4 and weight decay as 1e-7. All models are trained for 24 epochs with a total batch size of 8 on 8 3080Ti GPUs (12GB).