Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On scalable and efficient training of diffusion samplers
Authors: Minkyu Kim, Kiyoung Seong, Dongyeop Woo, Sungsoo Ahn, Minsu Kim
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that SGDS, despite its simplicity, produces substantial gains over baseline diffusion samplers across benchmarks: classical Gaussian mixtures and the Manywell task; particle simulation problems like LJ-13 and LJ-55; and real-world molecules, Alanine Di-, Tri-, and Tetra-peptide. Our method significantly improves sample efficiency and scalability, marking a practical path towards highdimensional diffusion-based inference. |
| Researcher Affiliation | Academia | 1Korea Advanced Institute of Science and Technology (KAIST) 2Mila Quebec AI Institute |
| Pseudocode | Yes | Algorithm 1 Training search-guided diffusion samplers (SGDS) |
| Open Source Code | Yes | Source code: https://github.com/minkyu1022/SGDS |
| Open Datasets | Yes | The reference samples can be downloaded from https://zenodo.org/records/15436773. |
| Dataset Splits | No | The paper describes methods for generating samples (e.g., MCMC chains, burn-in steps) and training on those generated samples, but it does not specify traditional train/validation/test splits of a pre-existing dataset. |
| Hardware Specification | No | The paper mentions memory limitations for PIS that require halving batch sizes due to the forward SDE computational graph, implying GPU memory constraints. However, it does not specify any particular GPU models, CPU models, or other hardware components used for running experiments. |
| Software Dependencies | No | The paper mentions "Torch ANI [16], a Py Torch implementation of ANI deep learning potentials", indicating the use of PyTorch and Torch ANI. However, specific version numbers for these software dependencies are not provided. |
| Experiment Setup | Yes | In all the experiments, we use four different random seeds and average the results of each run. We provide details of experimental settings in Appendix A.4, Table 4, and Table 5, and additional results in Appendix B. All methods adopt the PIS architecture [47, 39], with a joint network consisting of a two-layer MLP with 256 hidden dimensions. We run 25K epochs in both the first round and the second round. We train PIS at a learning rate of 1e-4, TB at a learning rate of 2e-4, and SGDS at a learning rate of 5e-4. We use 4 and 32 batch sizes for all methods except PIS in LJ-13 and LJ-55, respectively. |