CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion Transformer

Authors: Youngseok Kim, Sanmin Kim, Jun Won Choi, Dongsuk Kum

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our camera-radar fusion approach achieves the state-of-the-art 41.1% m AP and 52.3% NDS on the nu Scenes test set, which is 8.7 and 10.8 points higher than the cameraonly baseline, as well as yielding competitive performance on the Li DAR method. ... 4 Experiments ... 4.1 Comparison with State-of-the-Arts ... 4.2 Ablation Studies ... 4.3 Analysis
Researcher Affiliation Academia Youngseok Kim1, Sanmin Kim1, Jun Won Choi2, Dongsuk Kum1 1Korea Advanced Institute of Science and Technology 2Hanyang University {youngseok.kim, sanmin.kim, dskum}@kaist.ac.kr, junwchoi@hanyang.ac.kr
Pseudocode No The paper describes algorithms and methods in text and figures but does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code No The paper does not provide any explicit statements about open-sourcing the code or a link to a code repository.
Open Datasets Yes We evaluate our method on a large-scale and challenging nu Scenes dataset (Caesar et al. 2020).
Dataset Splits Yes which consists of 700/150/150 scenes for train/val/test set.
Hardware Specification Yes We train our models for 24 epochs with a batch size of 32, cosine annealing scheduler, and 2 × 10−4 learning rate on 4 RTX 3090 GPUs. Inference time is measured on an Intel Core i9 CPU and an RTX 3090 GPU without test time augmentation for fusion.
Software Dependencies No The paper mentions using CenterNet and DLA34 backbone, but it does not specify version numbers for these or other software libraries/frameworks.
Experiment Setup Yes We train our models for 24 epochs with a batch size of 32, cosine annealing scheduler, and 2 × 10−4 learning rate on 4 RTX 3090 GPUs.