Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Spiral: Semantic-Aware Progressive LiDAR Scene Generation and Understanding

Authors: Dekai Zhu, Yixuan Hu, Youquan Liu, Dongyue Lu, Lingdong Kong, Slobodan Ilic

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the Semantic KITTI and nu Scenes datasets demonstrate that SPIRAL achieves state-of-the-art performance with the smallest parameter size, outperforming two-step methods that combine the generative and segmentation models. Additionally, we validate that range images generated by SPIRAL can be effectively used for synthetic data augmentation in the downstream segmentation training, significantly reducing the labeling effort on Li DAR data.
Researcher Affiliation	Academia	Dekai Zhu1,4, Yixuan Hu1, Youquan Liu2 Dongyue Lu3 Lingdong Kong3 Slobodan Ilic1 1Technical University of Munich 2Fudan University 3National University of Singapore 4Munich Center for Machine Learning
Pseudocode	No	The paper describes the methodology with figures and textual explanations of steps (e.g., Unconditional Step, Conditional Step, Closed-Loop Inference) but does not provide a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	To ensure reproducibility, code and data are committed to be publicly available.
Open Datasets	Yes	We conduct an extensive experimental study on Semantic KITTI [3] and nu Scenes [5] datasets and follow their official data splits. Semantic KITTI contains 23k annotated Li DAR scenes with 19 semantic classes, while nu Scenes contains 28k Li DAR scenes with 16 semantic classes. During pre-processing, the Li DAR scenes are projected into range-view images of spatial resolutions 64 1024 and 32 1024, respectively. To further assess robustness, we also evaluate Spiral-based generative data augmentation on the fog and wet-ground subsets of Robo3D [24], which simulate adverse weather conditions for out-of-distribution testing.
Dataset Splits	Yes	We conduct an extensive experimental study on Semantic KITTI [3] and nu Scenes [5] datasets and follow their official data splits.
Hardware Specification	Yes	We train SPIRAL on NVIDIA A6000 GPUs with 48 GB VRAM for 300k steps using the Adam optimizer [21] with a learning rate of 1e-4. The training process takes 36 hours. [...] We report the average sampling time per sample for Li DARGen [73], Li DM [48], R2DM [42], and SPIRAL on an A6000 GPU in Table 5.
Software Dependencies	No	The paper mentions using the Adam optimizer [21], a 4-layer Efficient U-Net [50] as the backbone, and adaptive group normalization (Ada GN) [9] modules. However, it does not provide specific version numbers for software libraries or frameworks like PyTorch, TensorFlow, or CUDA.
Experiment Setup	Yes	We train SPIRAL on NVIDIA A6000 GPUs with 48 GB VRAM for 300k steps using the Adam optimizer [21] with a learning rate of 1e-4. The training process takes 36 hours. [...] During inference, the number of function evaluations (NFE) [42], i.e., the number of sampling steps, is set to 256 for both Spiral and R2DM. [...] Empirically, we set ψc as 0.5 to balance the training of these two step types. [...] For instance, setting δ to 0.8 requires that over 80% of the pixels in yt have prediction confidence scores exceeding 0.8. [...] We evaluate the performance of Spiral under different NFE settings in {32, 64, 128, 256, 512, 1024} on Semantic KITTI [3]. The results shown in Figure 7 indicate that Spiral s performance improves significantly when NFE < 256, while further increases in NFE yield only marginal gains on most metrics. Therefore, we set NFE = 256 as the default configuration. [...] The results listed in Table 4 indicate that Spiral performs well when δ {0.7, 0.8, 0.9} and achieves slightly best performance at δ = 0.8. Therefore, we set δ = 0.8 as the default configuration of Spiral.