Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Discrete Spatial Diffusion: Intensity-Preserving Diffusion Modeling
Authors: Javier E. Santos, Agnese Marcato, Roman Colman, NIcholas Lubbers, Yen Ting Lin
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the expressive flexibility of DSD by performing image synthesis, class conditioning, and image inpainting across standard image benchmarks, while exactly conditioning total image intensity. We validate DSD on two challenging scientific applications: porous rock microstructures and lithium-ion battery electrodes, demonstrating its ability to generate structurally realistic samples under strict mass conservation constraints, with quantitative evaluation using state-of-the-art metrics for transport and electrochemical performance. |
| Researcher Affiliation | Academia | Javier E. Santos Agnese Marcato Roman Colman Nicholas Lubbers Yen Ting Lin Los Alamos National Laboratory |
| Pseudocode | Yes | Algorithm 1 describes the DSD training pseudocode. Given the full transition probabilities pt(x , y , c |x, y, c) repeat I0 a sample drawn from the training set... Algorithm 2 DSD inference Given CFL condition number ε < 1 and desired total intensities in the color channels, initiate an image I0 with desired total intensities in the color channels for... |
| Open Source Code | Yes | All code supporting this study is available at https://github.com/lanl/ Discrete Spatial Diffusion. |
| Open Datasets | Yes | We begin with MNIST, where intensity naturally corresponds to stroke thickness and digit area. In Fig. 3(a), we show inpainting results with a fixed mask with no-flux boundary conditions. During inference, particles inside the masked region rearrange to complete the digit structure, while the surrounding region remains fixed. Holding the visible structure constant, we varied the number of particles allowed to move, revealing that the total intensity governs which digit is most likely to emerge. This highlights the ability of DSD to incorporate hard constraints in downstream tasks such as inpainting. Additionally, we trained a conditional DSD model that employed a standard class-conditioning [69]. In Figure 3(b) we illustrate the class-conditioned generated images with different total numbers of particles, varying from low, typical, to high total intensities. While these do exhibit some artifacts, DSD notably learns the spatial structure of the digits and generates Bolder or Lighter digits without saturating the upper bound of the intensity (i.e. 255 for uint8). This would not have been precisely realizable using conventional diffusion models. Comparisons with other conditioning approaches can be found in Appendix E. We also evaluated DSD on RGB datasets to explore scalability and generality. In Fig. 3(c,d), we show unconditional generations from models trained on CIFAR-10 and Celeb A. Despite the discrete-state space and intensity-preserving dynamics, the model captures complex semantic structures from lighting, and textures to animals, vehicles, and human facial features. Appendix D includes large grids of generated samples, Fréchet Inception Distance (FID), and spatial FID (s FID) metrics, sampling ablations, and post-processing strategies for improving sample smoothness. |
| Dataset Splits | No | Algorithm 1 DSD training Given the full transition probabilities pt(x , y , c |x, y, c) repeat I0 a sample drawn from the training set... The paper uses well-known datasets that have standard splits, but it does not explicitly state the train/validation/test splits used by the authors for their experiments. |
| Hardware Specification | Yes | DSD models were negligibly slower than their Gaussian counterparts, and models were trained in this work using one A100 GPU in <750 hours per model (RGB images) and <50 hours per model (microstructures, MNIST). All models were run in a single NVIDIA A100 (or similar). |
| Software Dependencies | No | We employ the Noise Conditional Score Network (NCSN++) [1, 69] with two modifications: the final convolutional layer outputs 4 times the number of input channels (e.g., 3 for RGB) to represent four directions (up, down, left, right), and we use a Soft Plus activation function to ensure non-negativity in the predicted rates. The hyperparameters are listed in Appendix H. To quantify these characteristics, we compute the spatial correlation function and pore size distribution (PSD) using Pore Spy [34], a Python-based toolkit for quantitative analysis of porous media images. We computed these metrics using the Python library Tau Factor [41], and the comparisons between the real and generated images based on these metrics are illustrated in Fig. 22, while a collection of the training data and generated images is in Fig.21. |
| Experiment Setup | Yes | H Hyperparameters for experiments In our experiments, we thoroughly tested our model on various hyperparameters using the MNIST dataset. The MNIST dataset was chosen as a baseline for hyperparameter testing due to its low computational training cost. We found that our model was very robust with respect to the hyperparameters used, consistently generating quality generations without hyperparameter tuning. Due to limited compute, only limited tests were performed on Celeb A, but we hypothesize that our model would perform well with different hyperparameters than the ones used. For the choice of our r , we chose a Figure 22: (Top) Eight randomly picked samples from the NMC cathodes dataset. (Bottom) Random unconditional realizations of our model. rate that was large enough to allow full degradation, enabling the model to learn to predict starting from full noise. See 3 for our hyperparameters used. Table 3: Hyperparameters used in all our experiments. All models were run in a single NVIDIA A100 (or similar). Boundary CFL Channel Training Dataset r Schedule Condition Loss Tolerance Multiplier Iterations Notes MNIST 120 Ours Periodic Eq. 5 0.15 (2,2,2) 100K unconditional MNIST 120 Ours Periodic Eq. 6 0.15 (2,2,2) 90K unconditional MNIST 120 Ours No-flux Eq. 6 0.15 (2,2,2) 80K unconditional MNIST 120 Ours No-flux Eq. 6 0.11 (2,2,2) 70K class-conditioned MNIST 85 Ours No-flux Eq. 6 0.07 (2,2,2) 40K inpainting (14x14) CIFAR10 160 Ours Periodic Eq. 6 Fig. 11 (1,2,2,2) 10M unconditional Celeb A 200 Ours No-flux Eq. 5 0.05 (1,2,2,2) 3M unconditional Electrodes 200 x5 Periodic Eq. 6 0.01 (1,2,2,2) 180k Rocks 250 x4 Periodic Eq. 6 0.1/0.2/0.05 (1,2,2,2) 50k tolerance avoids overlapping particles |