Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
CymbaDiff: Structured Spatial Diffusion for Sketch-based 3D Semantic Urban Scene Generation
Authors: Li Liang, Bo Miao, Xinyu Wang, NAVEED AKHTAR, Jordan Vice, Ajmal Mian
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on Sketch Sem3D demonstrate that Cymba Diff achieves superior semantic consistency, spatial realism, and cross-dataset generalization. |
| Researcher Affiliation | Academia | 1 The University of Western Australia 2 AIML, The University of Adelaide 3 The University of Melbourne |
| Pseudocode | No | The paper describes the model architecture and training process in prose and figures (e.g., Figure 2 and 3) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and dataset will be available at https://github.com/Lillianresearch-hub/Cymba Diff. |
| Open Datasets | Yes | We introduce Sketch Sem3D, the first large-scale benchmark for generating 3D outdoor semantic scenes from abstract freehand sketches and pseudo-labeled annotations of satellite images. The code and dataset will be available at https://github.com/Lillianresearch-hub/Cymba Diff. |
| Dataset Splits | Yes | Sketch-based Semantic KITTI includes 58,172 training and 815 validation frames, while Sketch-based KITTI-360 consists of 33,892 training and 2,165 validation frames. Our model is trained on the Sketch-based Semantic KITTI training split from the Sketch Sem3D dataset. For evaluation, we use the validation splits of both the Sketchbased Semantic KITTI and Sketch-based KITTI-360 subsets, also from Sketch Sem3D. |
| Hardware Specification | Yes | All experiments were conducted on a single NVIDIA GeForce RTX 4090 GPU with 24 GB of RAM. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer and a Warmup Cosine LR scheduler, but does not provide specific version numbers for software libraries or frameworks like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The Variational Autoencoder (VAE) was trained for 22 epochs using the Adam W optimizer with an initial learning rate of 3e-4. The VAE and the Cymba Diff denoising network were trained with a batch size of 2 and 4, each occupying approximately 20 GB of GPU memory. The Cymba Diff denoiser was trained for 31 epochs using the Adam W optimizer with a learning rate of 1e-3 and a weight decay of 1e-4. The number of denoising steps in Cymba Diff was set to 100. A Warmup Cosine LR scheduler was used in all training stages to gradually decrease the learning rate, which helped ensure stable convergence. |