DVPE: Divided View Position Embedding for Multi-View 3D Object Detection
Authors: Jiasen Wang, Zhenglin Li, Ke Sun, Xianyuan Liu, Yang Zhou
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our framework, named DVPE, achieves state-of-the-art performance (57.2% m AP and 64.5% NDS) on the nu Scenes test set. |
| Researcher Affiliation | Academia | Jiasen Wang1 , Zhenglin Li1 , Ke Sun1 , Xianyuan Liu2 and Yang Zhou1 1Shanghai University 2University of Sheffield |
| Pseudocode | No | The paper describes its methods using text and diagrams (Figures 1, 2, 3) but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes will be available at https://github.com/dop0/DVPE. |
| Open Datasets | Yes | Our framework is evaluated on the nu Scenes dataset [Caesar et al., 2020]. It contains 1k driving scenes, each with a duration of 20 seconds. |
| Dataset Splits | Yes | The dataset is split into three groups: 750 for training, 150 for validation, and 150 for testing. |
| Hardware Specification | No | The paper mentions training models but does not specify any hardware details such as GPU models, CPU types, or cloud computing resources used for the experiments. |
| Software Dependencies | No | The paper mentions using Adam W optimizer and backbone networks like ResNet and VoVNet, but it does not provide specific version numbers for any software dependencies or libraries (e.g., PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | The learning rate and batch size are set to 4 10 4 and 16, respectively. Our models for performance comparison are trained for 60 epochs, whereas in ablation studies they are trained for 24 epochs. For the proposed framework, the 3D world space is divided into 6 spaces and the shift angle is incremented by 20 degrees at each layer. By default, the top 128 2D Ro I features are cached in a memory queue with a length of 4 frames. We adopt one additional group of queries to perform one-to-many assignment training, and the number of 3D object queries and additional ones are both set to 900. |