RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM

Authors: Ziying Song, Guoxing Zhang, Lin Liu, Lei Yang, Shaoqing Xu, Caiyan Jia, Feiyang Jia, Li Wang

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Consequently, Robo Fusion achieves SOTA performance in noisy scenarios, as demonstrated by the KITTI-C and nu Scenes C benchmarks. ... We validate Robo Fusion s robustness against OOD noise scenarios in KITTI-C and nu Scenes-C datasets [Dong et al., 2023], achieving SOTA performance amid noise, as shown in Fig. 1.
Researcher Affiliation Academia Ziying Song1,2 , Guoxing Zhang3 , Lin Liu1,2 , Lei Yang4 , Shaoqing Xu5 , Caiyan Jia1,2 , Feiyang Jia1,2 , Li Wang6 1School of Computer Science and Technology, Beijing Jiaotong University, China 2 Beijing Key Lab of Traffic Data Analysis and Mining, China 3Hebei University of Science and Technology, China 4Tsinghua University, China 5University of Macau, China 6Beijing Institute of Technology, China {songziying, cyjia, feiyangjia}@bjtu.edu.cn
Pseudocode No The paper contains figures and descriptions of the framework, but no structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github. com/adept-thu/Robo Fusion.
Open Datasets Yes We perform experiments on both the clean public benchmarks (KITTI [Geiger et al., 2012] and nu Scenes [Caesar et al., 2020]) and the noisy public benchmarks (KITTI-C[Dong et al., 2023] and nu Scenes-C [Dong et al., 2023]).
Dataset Splits Yes The KITTI dataset provides synchronized Li DAR point clouds and front-view camera images, consists of 3,712 training samples, 3,769 validation samples and 7,518 test samples. The nu Scenes dataset is a large-scale 3D detection benchmark consisting of 700 training scenes, 150 validation scenes, and 150 testing scenes.
Hardware Specification Yes To enable effective training on the KITTI and nu Scenes datasets, we utilize 8 NVIDIA A100 GPUs for network training. Additionally, the runtime is evaluated on an NVIDIA A100 GPU.
Software Dependencies No The paper mentions 'Adam optimizer' and 'Open PCDet' but does not specify version numbers for any software dependencies.
Experiment Setup Yes Specifically, for KITTI, our Robo Fusion based on Focals Conv[Chen et al., 2022] involves training for 80 epochs. For nu Scenes, our Robo Fusion based on Trans Fusion [Bai et al., 2022] has 20 epochs of training. During the model inference stage, we employ a non-maximal suppression (NMS) operation in the Region Proposal Network (RPN) with an Io U threshold of 0.7. We select the top 100 region proposals to serve as inputs for the detection head. After refinement, we apply NMS again with an Io U threshold of 0.1 to eliminate redundant predictions.