DiffSF: Diffusion Models for Scene Flow Estimation
Authors: Yushan Zhang, Bastian Wandt, Maria Magnusson, Michael Felsberg
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple benchmarks, Flying Things3D [24], KITTI Scene Flow [25], and Waymo-Open [33], demonstrate state-of-the-art performance of our proposed method. |
| Researcher Affiliation | Academia | Yushan Zhang Bastian Wandt Maria Magnusson Michael Felsberg Linköping University {firstname.lastname}@liu.se |
| Pseudocode | Yes | Algorithm 1: Training; Algorithm 2: Sampling |
| Open Source Code | Yes | The code is available at https://github.com/Zhang Yushan3/Diff SF. |
| Open Datasets | Yes | We follow the most recent work in the field [43, 21, 5] and test the proposed method on three established benchmarks for scene flow estimation. Flying Things3D [24], KITTI Scene Flow [25], and Waymo-Open [33] |
| Dataset Splits | No | The paper mentions training and testing sets, e.g., "The former consists of 20000 and 2000 scenes for training and testing, respectively" (for Flying Things3D). However, it does not explicitly provide details about a separate validation split with percentages or sample counts. |
| Hardware Specification | Yes | The proposed method is trained on 4 NVIDIA A40 GPUs. |
| Software Dependencies | No | The paper mentions "Adam W optimizer" and "Pytorch One Cycle LR learning rate scheduler" but does not provide specific version numbers for these software components or any other major libraries like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We use the Adam W optimizer and a weight decay of 1 10 4. The initial learning rate is set to 4 10 4 for Flying Things3D [24] and 1 10 4 for Waymo-Open [33]. ... The model is trained for 600k iterations with a batch size of 24. ... The number of diffusion steps is set to 20 during training and 2 during inference. The number of nearest neighbors k in DGCNN and Local Transformer is set to 16. The number of global-cross transformer layers is set to 14. The number of feature channels is set to 128. |