Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving
Authors: Junkai Xu, Liang Peng, Haoran Cheng, Linxuan Xia, Qi Zhou, Dan Deng, Wei Qian, Wenxiao Wang, Deng Cai
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the Occ3D and nu Scenes datasets demonstrate that Vampire facilitates fine-grained and appropriate extraction of dense 3D features, and is competitive with existing SOTA methods across diverse downstream perception tasks like 3D occupancy prediction, Li DAR segmentation and 3D objection detection, while utilizing moderate GPU resources. |
| Researcher Affiliation | Collaboration | Junkai Xu1,2*, Liang Peng1,2,*, Haoran Cheng1,2,*, Linxuan Xia1,2,*, Qi Zhou1,2,*, Dan Deng2, Wei Qian2, Wenxiao Wang3 , Deng Cai1,2 1State Key Lab of CAD & CG, Zhejiang University 2FABU Inc. 3School of Software Technology, Zhejiang University |
| Pseudocode | No | The paper describes its method in detail using text, figures, and mathematical equations, but it does not provide pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | We provide a video demonstration in the supplementary materials and Codes are available at github.com/cskkxjk/Vampire. |
| Open Datasets | Yes | Experimental results on the Occ3D and nu Scenes datasets demonstrate that Vampire facilitates fine-grained and appropriate extraction of dense 3D features, and is competitive with existing SOTA methods across diverse downstream perception tasks like 3D occupancy prediction, Li DAR segmentation and 3D objection detection, while utilizing moderate GPU resources. Datasets. The nu Scenes dataset contains 1000 scenes of 20 seconds duration each, and the key samples are annotated at 2Hz. Each sample consists of RGB images from 6 surrounding cameras with 360 horizontal FOV and point cloud data from 32 beams Li DAR. The total of 1000 scenes are officially divided into training, validation and test splits with 700, 150 and 150 scenes, respectively. Occ3Dnu Scenes (Tian et al. 2023) contains 700 traing scenes and 150 validation scenes. |
| Dataset Splits | Yes | The nu Scenes dataset contains 1000 scenes of 20 seconds duration each, and the key samples are annotated at 2Hz. Each sample consists of RGB images from 6 surrounding cameras with 360 horizontal FOV and point cloud data from 32 beams Li DAR. The total of 1000 scenes are officially divided into training, validation and test splits with 700, 150 and 150 scenes, respectively. Occ3Dnu Scenes (Tian et al. 2023) contains 700 traing scenes and 150 validation scenes. |
| Hardware Specification | Yes | All models are trained for 24 epochs with a total batch size of 8 on 8 3080Ti GPUs (12GB). |
| Software Dependencies | No | The paper mentions using ResNet-50 as an image backbone and basing implementation on BEVDepth, but it does not specify software versions for these or other dependencies (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | Our implementation is based on official repository of BEVDepth (Li et al. 2022b). We use Res Net-50 (He et al. 2016) as image backbone and the image resolution of 256 704 to meet our computational resources. For the inpainting network, we adpot an hourglasslike architecture (further details are provided in our supplementary materials). The intermediate 3D feature resolutions are 20 256 256 corresponding to the range of [ 3.0, 5.0] [ 51.2, 51.2] [ 51.2, 51.2](meter) and the 3D feature dimension are set to 16 by default. We use Adam W as an optimizer with a learning rate set to 2e-4 and weight decay as 1e-7. All models are trained for 24 epochs with a total batch size of 8 on 8 3080Ti GPUs (12GB). |