DiffuBox: Refining 3D Object Detection with Point Diffusion

Authors: Xiangyu Chen, Zhenzhen Liu, Katie Luo, Siddhartha Datta, Adhitya Polavaram, Yan Wang, Yurong You, Boyi Li, Marco Pavone, Wei-Lun (Harry) Chao, Mark Campbell, Bharath Hariharan, Kilian Q. Weinberger

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets, object classes and detectors.
Researcher Affiliation Collaboration 1Cornell University 2University of Oxford 3NVIDIA Research 4The Ohio State University
Pseudocode Yes S1 Algorithmic Description of Diffu Box Below we provide an algorithmic description of the training and inference workflow of Diffu Box. Algorithm 1 Diffu Box Training Algorithm 2 Diffu Box Inference
Open Source Code Yes Our Py Torch implementation is available at https://github.com/cxy1997/Diffu Box.
Open Datasets Yes We primarily consider three datasets: The KITTI dataset [6], the Lyft Level 5 Perception dataset [16], and the Ithaca365 dataset [5].
Dataset Splits Yes For KITTI, we follow the official splits. For Lyft, we follow various existing works [58, 59, 24] and use the splits separated by geographical locations, consisting of 11,873 point clouds for training and 4,901 for testing. For Ithaca365, we utilize the annotated point clouds with 4,445 for training and 1,644 for testing.
Hardware Specification Yes We use NVIDIA A6000 for all of our experiments.
Software Dependencies No For detectors, we use the implementation and configurations from Open PCDet [44]. For diffusion models, we use [14]’s implementation and follow their noise schedule σmax = 80. The paper mentions PyTorch in the abstract, but specific version numbers for these software components are not provided.
Experiment Setup Yes We set the context limit to 4x the bounding box size. We use shape weight 0.1 for cars and pedestrians, and 0.01 for cyclists... In the diffusion process, we follow [13] and use noise level distribution ln σ ~ N(1.2, 1.2^2), ODE schedule σ(t) = t, and 2nd order Heun solver. The denoiser transformer model contains 12 self-attention layers with hidden size 1024; each layer has 2048 intermediate dimensions and 8 heads. The diffusion model is trained with batch size 128 and learning rate 0.0001 for 100k steps.