RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM
Authors: Ziying Song, Guoxing Zhang, Lin Liu, Lei Yang, Shaoqing Xu, Caiyan Jia, Feiyang Jia, Li Wang
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Consequently, Robo Fusion achieves SOTA performance in noisy scenarios, as demonstrated by the KITTI-C and nu Scenes C benchmarks. ... We validate Robo Fusion s robustness against OOD noise scenarios in KITTI-C and nu Scenes-C datasets [Dong et al., 2023], achieving SOTA performance amid noise, as shown in Fig. 1. |
| Researcher Affiliation | Academia | Ziying Song1,2 , Guoxing Zhang3 , Lin Liu1,2 , Lei Yang4 , Shaoqing Xu5 , Caiyan Jia1,2 , Feiyang Jia1,2 , Li Wang6 1School of Computer Science and Technology, Beijing Jiaotong University, China 2 Beijing Key Lab of Traffic Data Analysis and Mining, China 3Hebei University of Science and Technology, China 4Tsinghua University, China 5University of Macau, China 6Beijing Institute of Technology, China {songziying, cyjia, feiyangjia}@bjtu.edu.cn |
| Pseudocode | No | The paper contains figures and descriptions of the framework, but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github. com/adept-thu/Robo Fusion. |
| Open Datasets | Yes | We perform experiments on both the clean public benchmarks (KITTI [Geiger et al., 2012] and nu Scenes [Caesar et al., 2020]) and the noisy public benchmarks (KITTI-C[Dong et al., 2023] and nu Scenes-C [Dong et al., 2023]). |
| Dataset Splits | Yes | The KITTI dataset provides synchronized Li DAR point clouds and front-view camera images, consists of 3,712 training samples, 3,769 validation samples and 7,518 test samples. The nu Scenes dataset is a large-scale 3D detection benchmark consisting of 700 training scenes, 150 validation scenes, and 150 testing scenes. |
| Hardware Specification | Yes | To enable effective training on the KITTI and nu Scenes datasets, we utilize 8 NVIDIA A100 GPUs for network training. Additionally, the runtime is evaluated on an NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' and 'Open PCDet' but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | Specifically, for KITTI, our Robo Fusion based on Focals Conv[Chen et al., 2022] involves training for 80 epochs. For nu Scenes, our Robo Fusion based on Trans Fusion [Bai et al., 2022] has 20 epochs of training. During the model inference stage, we employ a non-maximal suppression (NMS) operation in the Region Proposal Network (RPN) with an Io U threshold of 0.7. We select the top 100 region proposals to serve as inputs for the detection head. After refinement, we apply NMS again with an Io U threshold of 0.1 to eliminate redundant predictions. |