iDet3D: Towards Efficient Interactive Object Detection for LiDAR Point Clouds

Authors: Dongmin Choi, Wonwoo Cho, Kangyeol Kim, Jaegul Choo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through our extensive experiments, we present that our method can construct precise annotations in a few clicks, which shows the practicality as an efficient annotation tool for 3D object detection.In the experiments, our main interest is two-fold: (1) demonstrating the effectiveness of the interactive approach compared to state-of-the-art automatic (non-interactive) detectors, i.e., the performance can be significantly improved by using a few user clicks, and (2) validating the effectiveness of the proposed components of i Det3D (NCS, DCG, and SCP) by comparing i Det3D to the vanilla model.
Researcher Affiliation Collaboration Dongmin Choi1,2*, Wonwoo Cho1,2*, Kangyeol Kim1,2, Jaegul Choo1,2 1Letsur Inc. 2Korea Advanced Institute of Science and Technology {dmchoi, wcho, kangyeolk, jchoo}@kaist.ac.kr
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code No The paper mentions referring to existing codebases for IA-SSD and SASA, but does not provide concrete access or an explicit statement about releasing the source code for iDet3D.
Open Datasets Yes Datasets. KITTI benchmark (Geiger, Lenz, and Urtasun 2012) is a widely used 3D object detection dataset... We also evaluate i Det3D on a more challenging nu Scenes dataset (Caesar et al. 2020)...
Dataset Splits Yes KITTI benchmark (Geiger, Lenz, and Urtasun 2012) is a widely used 3D object detection dataset, which consists of 3,712 training and 3,769 validation samples with three object classes: Car, Pedestrian, and Cyclist.
Hardware Specification Yes We use 4 NVIDIA RTX A6000 GPUs for experiments.
Software Dependencies No The paper refers to existing codebases (IA-SSD, SASA) but does not provide specific software names with version numbers for reproducibility.
Experiment Setup Yes The number of clicks K is determined as min (Nu, No), where Nu is sampled from {0, , 10} uniformly at random and No refers to the number of existing objects in each scene. The distance threshold τ of user encodings is set to 2.0 in Eq. (1). For negative clicks, we set the maximum number Kn to 10.