DiffuBox: Refining 3D Object Detection with Point Diffusion
Authors: Xiangyu Chen, Zhenzhen Liu, Katie Luo, Siddhartha Datta, Adhitya Polavaram, Yan Wang, Yurong You, Boyi Li, Marco Pavone, Wei-Lun (Harry) Chao, Mark Campbell, Bharath Hariharan, Kilian Q. Weinberger
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets, object classes and detectors. |
| Researcher Affiliation | Collaboration | 1Cornell University 2University of Oxford 3NVIDIA Research 4The Ohio State University |
| Pseudocode | Yes | S1 Algorithmic Description of Diffu Box Below we provide an algorithmic description of the training and inference workflow of Diffu Box. Algorithm 1 Diffu Box Training Algorithm 2 Diffu Box Inference |
| Open Source Code | Yes | Our Py Torch implementation is available at https://github.com/cxy1997/Diffu Box. |
| Open Datasets | Yes | We primarily consider three datasets: The KITTI dataset [6], the Lyft Level 5 Perception dataset [16], and the Ithaca365 dataset [5]. |
| Dataset Splits | Yes | For KITTI, we follow the official splits. For Lyft, we follow various existing works [58, 59, 24] and use the splits separated by geographical locations, consisting of 11,873 point clouds for training and 4,901 for testing. For Ithaca365, we utilize the annotated point clouds with 4,445 for training and 1,644 for testing. |
| Hardware Specification | Yes | We use NVIDIA A6000 for all of our experiments. |
| Software Dependencies | No | For detectors, we use the implementation and configurations from Open PCDet [44]. For diffusion models, we use [14]’s implementation and follow their noise schedule σmax = 80. The paper mentions PyTorch in the abstract, but specific version numbers for these software components are not provided. |
| Experiment Setup | Yes | We set the context limit to 4x the bounding box size. We use shape weight 0.1 for cars and pedestrians, and 0.01 for cyclists... In the diffusion process, we follow [13] and use noise level distribution ln σ ~ N(1.2, 1.2^2), ODE schedule σ(t) = t, and 2nd order Heun solver. The denoiser transformer model contains 12 self-attention layers with hidden size 1024; each layer has 2048 intermediate dimensions and 8 heads. The diffusion model is trained with batch size 128 and learning rate 0.0001 for 100k steps. |