Is Your LiDAR Placement Optimized for 3D Scene Understanding?
Authors: Ye Li, Lingdong Kong, Hanjiang Hu, Xiaohao Xu, Xiaonan Huang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Latest advancements have prompted increasing interest in multi-Li DAR perception. However, prevailing driving datasets predominantly utilize single-Li DAR systems and collect data devoid of adverse conditions, failing to capture the complexities of real-world environments accurately. Addressing these gaps, we proposed Place3D, a full-cycle pipeline that encompasses Li DAR placement optimization, data generation, and downstream evaluations. Our framework makes three appealing contributions. 1) To identify the most effective configurations for multi-Li DAR systems, we introduce the Surrogate Metric of the Semantic Occupancy Grids (M-SOG) to evaluate Li DAR placement quality. 2) Leveraging the M-SOG metric, we propose a novel optimization strategy to refine multi-Li DAR placements. 3) Centered around the theme of multi-condition multi-Li DAR perception, we collect a 280,000-frame dataset from both clean and adverse conditions. Extensive experiments demonstrate that Li DAR placements optimized using our approach outperform various baselines. We showcase exceptional results in both Li DAR semantic segmentation and 3D object detection tasks, under diverse weather and sensor failure conditions. |
| Researcher Affiliation | Academia | Ye Li1 Lingdong Kong2 Hanjiang Hu3 Xiaohao Xu1 Xiaonan Huang1 1University of Michigan, Ann Arbor 2National University of Singapore 3Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1 Multi-Li DAR Placement Optimization for Semantic 3D Scene Understanding |
| Open Source Code | Yes | https://github.com/ywyeli/Place3D |
| Open Datasets | Yes | Data Generation. We generate Li DAR point clouds and ground truth using CARLA [23]. We use the maps of Towns 1, 3, 4, and 6 and set 6 ego-vehicle routes for each map. We incorporate 23 semantic classes for Li DAR semantic segmentation and 3 instance classes for 3D object detection. Data collection is performed for 10 Li DAR placements, resulting in a total of 280,000 frames: 1) For each placement, we gather 340 clean scenes and 360 corrupted scenes, with each scene consisting of 40 frames. 2) The clean set comprises 13,600 frames, including 11,200 samples (280 scenes) for training and 2,400 samples (60 scenes) for validation, following the split ratio used in nu Scenes [7]. ...The Place3D dataset is released under the CC BY-NC-SA 4.0 license1. |
| Dataset Splits | Yes | 2) The clean set comprises 13,600 frames, including 11,200 samples (280 scenes) for training and 2,400 samples (60 scenes) for validation, following the split ratio used in nu Scenes [7]. |
| Hardware Specification | Yes | All Li DAR semantic segmentation models are trained and tested on eight NVIDIA A100 SXM4 80GB GPUs. All 3D object detection models are trained and tested on four NVIDIA RTX 6000 Ada 48GB GPUs. |
| Software Dependencies | No | The paper mentions using the 'MMDetection3D codebase [20]' and specific optimizers like 'AdamW', but it does not specify version numbers for these or other software components like Python, PyTorch, or CUDA, which are necessary for full reproducibility. |
| Experiment Setup | Yes | The detailed training configurations of the four Li DAR semantic segmentation models, i.e., Mink UNet [18], SPVCNN [86], Polar Net [111], and Cylinder3D [117], are presented in Table 8. The detailed training configurations of the four 3D object detection models, i.e., Point Pillars [51], Center Point [107], BEVFusion-L [63], and FSTR [109], are presented in Table 9. These tables include specific hyperparameters such as Batch Size, Epochs, Optimizer, Learning Rate, Weight Decay, and Epsilon. |