reproducibilityindex.ai

Uni3DETR: Unified 3D Detection Transformer

Authors: Zhenyu Wang, Ya-Li Li, Xi Chen, Hengshuang Zhao, Shengjin Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate that Uni3DETR exhibits excellent performance consistently on both indoor and outdoor 3D detection. Our Uni3DETR achieves the state-of-the-art results on both indoor [53, 12, 1] and outdoor [16, 3] datasets.
Researcher Affiliation	Academia	1 Department of Electronic Engineering, Tsinghua University, BNRist 2 The University of Hong Kong
Pseudocode	No	The paper describes the model architecture and processes but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Codes are available at https://github.com/zhenyuw16/Uni3DETR.
Open Datasets	Yes	For indoor 3D detection, we evaluate Uni3DETR on three indoor 3D scene datasets: SUN RGB-D [53], Scan Net V2 [12] and S3DIS [1]. For outdoor 3D detection, we conduct experiments on two popular outdoor benchmarks: KITTI [16] and nu Scenes [3].
Dataset Splits	Yes	SUN RGB-D is a single-view indoor dataset with 5,285 training and 5,050 validation scenes... Scan Net V2 contains 1,201 reconstructed training scans and 312 validation scans... The KITTI dataset consists of 7,481 LiDAR samples for its official training set, and we split it into 3,712 training samples and 3,769 validation samples for training and evaluation. We train on the 28,130 frames of samples in the training set and evaluate on the 6,010 validation samples.
Hardware Specification	Yes	The computational cost is measured on a single RTX 3090 GPU.
Software Dependencies	No	The paper mentions implementing Uni3DETR with mmdetection3D [11] and training with the Adam W [32] optimizer, but it does not specify version numbers for these software components or any other libraries.
Experiment Setup	Yes	We set the number of learnable query points to 300 for datasets except for nu Scenes, where we set to 900. For indoor datasets, we choose the 0.02m grid size. For the KITTI dataset, we use a (0.05m, 0.05m, 0.1m) voxel size and for the nu Scenes, we use the (0.075m, 0.075m, 0.2m) voxel size. The nu Scenes model is trained with 20 epochs, with the CBGS [75] strategy. We train Uni3DETR with the initial learning rate of 1.67e-4 and the batch size of 32 for 90 epochs, and the learning rate is decayed by 10x on the 70th and 80th epoch.