DVPE: Divided View Position Embedding for Multi-View 3D Object Detection

Authors: Jiasen Wang, Zhenglin Li, Ke Sun, Xianyuan Liu, Yang Zhou

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our framework, named DVPE, achieves state-of-the-art performance (57.2% m AP and 64.5% NDS) on the nu Scenes test set.
Researcher Affiliation Academia Jiasen Wang1 , Zhenglin Li1 , Ke Sun1 , Xianyuan Liu2 and Yang Zhou1 1Shanghai University 2University of Sheffield
Pseudocode No The paper describes its methods using text and diagrams (Figures 1, 2, 3) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Codes will be available at https://github.com/dop0/DVPE.
Open Datasets Yes Our framework is evaluated on the nu Scenes dataset [Caesar et al., 2020]. It contains 1k driving scenes, each with a duration of 20 seconds.
Dataset Splits Yes The dataset is split into three groups: 750 for training, 150 for validation, and 150 for testing.
Hardware Specification No The paper mentions training models but does not specify any hardware details such as GPU models, CPU types, or cloud computing resources used for the experiments.
Software Dependencies No The paper mentions using Adam W optimizer and backbone networks like ResNet and VoVNet, but it does not provide specific version numbers for any software dependencies or libraries (e.g., PyTorch, TensorFlow, CUDA).
Experiment Setup Yes The learning rate and batch size are set to 4 10 4 and 16, respectively. Our models for performance comparison are trained for 60 epochs, whereas in ablation studies they are trained for 24 epochs. For the proposed framework, the 3D world space is divided into 6 spaces and the shift angle is incremented by 20 degrees at each layer. By default, the top 128 2D Ro I features are cached in a memory queue with a length of 4 frames. We adopt one additional group of queries to perform one-to-many assignment training, and the number of 3D object queries and additional ones are both set to 900.