SPGroup3D: Superpoint Grouping Network for Indoor 3D Object Detection

Authors: Yun Zhu, Le Hui, Yaqi Shen, Jin Xie

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate our method achieves state-of-the-art performance on Scan Net V2, SUN RGB-D, and S3DIS datasets in the indoor one-stage 3D object detection.
Researcher Affiliation Academia Yun Zhu1, Le Hui2, Yaqi Shen1, Jin Xie1* 1 PCA Lab, School of Computer Science and Engineering, Nanjing University of Science and Technology, China 2 Shaanxi Key Laboratory of Information Acquisition and Processing, Northwestern Polytechnical University, China zhu.yun@njust.edu.cn, huile@nwpu.edu.cn, syq@njust.edu.cn, csjxie@njust.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Source code is available at https://github.com/zyrant/SPGroup3D.
Open Datasets Yes Our SPGroup3D is evaluated on three indoor challenging 3D scene datasets, i.e., Scan Net V2 (Dai et al. 2017), SUN RGB-D (Song, Lichtenberg, and Xiao 2015), S3DIS (Armeni et al. 2016).
Dataset Splits Yes For all datasets, we follow the standard data splits adopted in (Qi et al. 2019) and (Gwak, Choy, and Savarese 2020). Scan Net V2: The dataset is divided into 1,201 training samples, with the remaining 312 used for validation. SUN RGB-D: This dataset is divided into approximately 5,000 training and 5,000 validation samples. S3DIS: Area 5 is used for validation, while the remains are the training subset.
Hardware Specification Yes The experiments are conducted on four NVIDIA RTX 3090 GPUs.
Software Dependencies No The paper mentions using the "MMdetection3D framework" but does not provide specific version numbers for it or any other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We set the voxel size as 0.02m for all datasets. For the backbone, we use the same backbone introduced in (Wang et al. 2022a) as the voxel feature extractor and the voxel size of output high-resolution is 0.04m. In terms of the superpoint-based grouping, we set the iteration number to 3 and the neighbour number to 8. Moreover, following the setting in FCAF3D (Rukhovich, Vorontsova, and Konushin 2022), we set the number of positive samples to 18 in multiple matching. Following the approach in (Rukhovich, Vorontsova, and Konushin 2023), we employ the Adam W optimizer (Kingma and Ba 2014) with batch size, initial learning rate, and weight decay set to 4, 0.001, and 0.0001, respectively. Training is performed for 15 epochs on each dataset, with a learning rate decay by a factor of 10 at the 9th and 12-th epochs.