DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation
Authors: Yueru Luo, Shuguang Cui, Zhen Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the public benchmark, Open Lane, demonstrate the efficacy and efficiency of DV-3DLane. It achieves state-of-the-art performance. with a remarkable 11.2 gain in F1 score and a substantial 53.5% reduction in errors. |
| Researcher Affiliation | Academia | 1 FNii, CUHK-Shenzhen 2 School of Science and Engineering, CUHK-Shenzhen {222010057@link.,shuguangcui@,lizhen@}cuhk.edu.cn |
| Pseudocode | Yes | Algorithm 1 Bidirectional Feature Fusion (BFF) Input: Li DAR points Ppt, image I, camera parameters T Output: mm-aware PV features Fpv, BEV features Fbev, mm denotes multi-modal. |
| Open Source Code | Yes | The code is available at https://github.com/JMoonr/dv-3dlane. |
| Open Datasets | Yes | We evaluate our method on Open Lane Chen et al. (2022), the sole public 3D lane dataset featuring multi-modal sources, Open Lane is a large-scale dataset built on Waymo Open Dataset Sun et al. (2020) |
| Dataset Splits | No | The paper mentions training and testing on datasets like Open Lane-1K and Open Lane-300, but it does not specify the exact percentages or sample counts for validation splits, nor does it explicitly cite predefined validation splits within the paper. |
| Hardware Specification | Yes | All models are tested on a single V100 GPU |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'cosine annealing scheduler' but does not specify software versions for these or other dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We set the number of lane queries to 40, and we employ deformable attention with 4 heads, 8 sample points, and 256 embedding dimensions. We use the Adam optimizer Kingma & Ba (2014) with a weight decay of 0.01. The learning rate is set to 2e-4, and our models undergo training for 24 epochs with a batch size of 32. We employ the cosine annealing scheduler Loshchilov & Hutter (2016) with Tmax = 8. Our input images are of resolution 720 960, and we adopt a voxel size of (0.2m, 0.4m) for the X and Y axes. |