Uni3DETR: Unified 3D Detection Transformer

Authors: Zhenyu Wang, Ya-Li Li, Xi Chen, Hengshuang Zhao, Shengjin Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validate that Uni3DETR exhibits excellent performance consistently on both indoor and outdoor 3D detection. Our Uni3DETR achieves the state-of-the-art results on both indoor [53, 12, 1] and outdoor [16, 3] datasets.
Researcher Affiliation Academia 1 Department of Electronic Engineering, Tsinghua University, BNRist 2 The University of Hong Kong
Pseudocode No The paper describes the model architecture and processes but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Codes are available at https://github.com/zhenyuw16/Uni3DETR.
Open Datasets Yes For indoor 3D detection, we evaluate Uni3DETR on three indoor 3D scene datasets: SUN RGB-D [53], Scan Net V2 [12] and S3DIS [1]. For outdoor 3D detection, we conduct experiments on two popular outdoor benchmarks: KITTI [16] and nu Scenes [3].
Dataset Splits Yes SUN RGB-D is a single-view indoor dataset with 5,285 training and 5,050 validation scenes... Scan Net V2 contains 1,201 reconstructed training scans and 312 validation scans... The KITTI dataset consists of 7,481 LiDAR samples for its official training set, and we split it into 3,712 training samples and 3,769 validation samples for training and evaluation. We train on the 28,130 frames of samples in the training set and evaluate on the 6,010 validation samples.
Hardware Specification Yes The computational cost is measured on a single RTX 3090 GPU.
Software Dependencies No The paper mentions implementing Uni3DETR with mmdetection3D [11] and training with the Adam W [32] optimizer, but it does not specify version numbers for these software components or any other libraries.
Experiment Setup Yes We set the number of learnable query points to 300 for datasets except for nu Scenes, where we set to 900. For indoor datasets, we choose the 0.02m grid size. For the KITTI dataset, we use a (0.05m, 0.05m, 0.1m) voxel size and for the nu Scenes, we use the (0.075m, 0.075m, 0.2m) voxel size. The nu Scenes model is trained with 20 epochs, with the CBGS [75] strategy. We train Uni3DETR with the initial learning rate of 1.67e-4 and the batch size of 32 for 90 epochs, and the learning rate is decayed by 10x on the 70th and 80th epoch.