DiffSF: Diffusion Models for Scene Flow Estimation

Authors: Yushan Zhang, Bastian Wandt, Maria Magnusson, Michael Felsberg

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on multiple benchmarks, Flying Things3D [24], KITTI Scene Flow [25], and Waymo-Open [33], demonstrate state-of-the-art performance of our proposed method.
Researcher Affiliation Academia Yushan Zhang Bastian Wandt Maria Magnusson Michael Felsberg Linköping University {firstname.lastname}@liu.se
Pseudocode Yes Algorithm 1: Training; Algorithm 2: Sampling
Open Source Code Yes The code is available at https://github.com/Zhang Yushan3/Diff SF.
Open Datasets Yes We follow the most recent work in the field [43, 21, 5] and test the proposed method on three established benchmarks for scene flow estimation. Flying Things3D [24], KITTI Scene Flow [25], and Waymo-Open [33]
Dataset Splits No The paper mentions training and testing sets, e.g., "The former consists of 20000 and 2000 scenes for training and testing, respectively" (for Flying Things3D). However, it does not explicitly provide details about a separate validation split with percentages or sample counts.
Hardware Specification Yes The proposed method is trained on 4 NVIDIA A40 GPUs.
Software Dependencies No The paper mentions "Adam W optimizer" and "Pytorch One Cycle LR learning rate scheduler" but does not provide specific version numbers for these software components or any other major libraries like Python, PyTorch, or CUDA.
Experiment Setup Yes We use the Adam W optimizer and a weight decay of 1 10 4. The initial learning rate is set to 4 10 4 for Flying Things3D [24] and 1 10 4 for Waymo-Open [33]. ... The model is trained for 600k iterations with a batch size of 24. ... The number of diffusion steps is set to 20 during training and 2 during inference. The number of nearest neighbors k in DGCNN and Local Transformer is set to 16. The number of global-cross transformer layers is set to 14. The number of feature channels is set to 128.