Fully Sparse 3D Object Detection
Authors: Lue Fan, Feng Wang, Naiyan Wang, ZHAO-XIANG ZHANG
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on the large-scale Waymo Open Dataset to reveal the inner workings, and state-of-the-art performance is reported. To demonstrate the superiority of FSD in long-range detection, we also conduct experiments on Argoverse 2 Dataset, which has a much larger perception range (200m) than Waymo Open Dataset (75m). |
| Researcher Affiliation | Collaboration | Lue Fan1,2,3,4 Feng Wang5 Naiyan Wang5 Zhaoxiang Zhang1,2,3,6, 1Institute of Automation, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3National Laboratory of Pattern Recognition, CASIA 4School of Future Technology, UCAS 5Tu Simple 6Center for Artificial Intelligence and Robotics, HKISI_CAS {fanlue2019, zhaoxiang.zhang}@ia.ac.cn {feng.wff, winsty}@gmail.com |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is released at https://github.com/Tu Simple/SST. |
| Open Datasets | Yes | Dataset: Waymo Open Dataset (WOD) We conduct our main experiments on WOD [31]. WOD is currently the largest and most trustworthy benchmark for Li DAR-based 3D object detection. WOD contains 1150 sequences (more than 200K frames), 798 for training, 202 for validation, and 150 for test. Dataset: Argoverse 2 (AV2) We further conduct long-range experiments on the recently released Argoverse 2 dataset [37] to demonstrate the superiority of FSD in long-range detection. AV2 has a similar scale to WOD, and it contains 1000 sequences in total, 700 for training, 150 for validation, and 150 for test. |
| Dataset Splits | Yes | WOD contains 1150 sequences (more than 200K frames), 798 for training, 202 for validation, and 150 for test. ... AV2 has a similar scale to WOD, and it contains 1000 sequences in total, 700 for training, 150 for validation, and 150 for test. |
| Hardware Specification | Yes | Statistics are obtained on a single 3090 GPU with batch size 1. |
| Software Dependencies | No | The paper mentions MMDetection3D but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA, MMDetection3D). |
| Experiment Setup | Yes | We use 4 sparse regional attention blocks [5] in SST as our voxel feature extractor. The SIR module and SIR2 module consist of 3 and 6 SIR layers, respectively. A SIR layer is defined by Eqn. 1 and Eqn. 2. Our SST-based model converges much faster than SST, so we train our models for 6 epochs instead of the 2 schedule (24 epochs) in SST. For FSDspconv, in addition to the 6-epoch schedule, we adopt a longer schedule (12 epochs) for better performance. Different from the default setting in MMDetection3D, we decrease the number of pasted instances in the Copy Paste augmentation, to prevent FSD from overfitting. |