Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection
Authors: Cheng-Ju Ho, Chen-Hsuan Tai, Yen-Yu Lin, Ming-Hsuan Yang, Yi-Hsuan Tsai
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on the Scan Net and SUN RGB-D benchmark datasets to demonstrate that our approach achieves state-of-the-art performance against existing methods. |
| Researcher Affiliation | Collaboration | Cheng-Ju Ho1 Chen-Hsuan Tai1 Yen-Yu Lin1 Ming-Hsuan Yang2,3 Yi-Hsuan Tsai3 1National Yang Ming Chiao Tung University 2University of California at Merced 3Google |
| Pseudocode | Yes | Algorithm 1: Teacher Model" and "Algorithm 2: Student Model |
| Open Source Code | Yes | The source code will be available at https://github.com/luluho1208/Diffusion-SS3D. |
| Open Datasets | Yes | We evaluate our method on two benchmarks, including the Scan Net [12] and SUN RGB-D [46] datasets, with the evaluation settings adopted in the prior semi-supervised 3D object detection works [16, 53, 63]. |
| Dataset Splits | Yes | We split both benchmarks into labeled and unlabeled data for SSL, using labeled data ratios of 5%, 10%, and 20% for Scan Net and 1%, 5%, and 10% for SUN RGB-D. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts) are mentioned for the experiments. |
| Software Dependencies | No | In this work, we employ Point Net++ [31] as the encoder and Io U-aware Vote Net [53] as the decoder. |
| Experiment Setup | Yes | In the pre-training phase, we only use labeled data with a batch size of 4 to train the diffusion model. The model is trained for 900 epochs with an initial learning rate of 0.005. Like [16, 53], the learning rate then decays at the 400th, 600th, and 800th epochs with a factor of 0.1. In the phase of semi-supervised learning, a batch is composed of 4 labeled and 8 unlabeled data. The pre-trained model is used for initializing both the teacher and student models. The student model is trained for 1,000 epochs using the Adam W optimizer, with an initial learning rate of 0.005. Like [16, 53], the learning rate decays at the 400th, 600th, 800th, and 900th epochs with factors of 0.3, 0.3, 0.1, and 0.1, respectively. For the diffusion process, we set the maximum timesteps to 1000 and the number of proposal boxes Nb to 128. |