Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Posterior Sampling by Combining Diffusion Models with Annealed Langevin Dynamics

Authors: Zhiyang Xun, Shivam Gupta, ecprice

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate our theoretical analysis and assess real-world performance, we study three inverse problems on FFHQ 256 [KLA21]: inpainting, 4 super-resolution, and Gaussian deblurring. Experiments use 1k validation images and the pre-trained diffusion model from [CKM+23]. Forward operators are specified as in [CKM+23]: inpainting masks 30% 70% of pixels uniformly at random; super-resolution downsamples by a factor of 4; deblurring convolves the ground-truth with a Gaussian kernel of size 61 61 (std. 3.0). We first obtain initial reconstructions x0 via Diffusion Posterior Sampling (DPS) [DS24], then refine them with our annealed Langevin sampler to draw samples close to p(x \| x0, y). To control runtime, we sweep the step size while keeping the annealing schedule fixed. For each step size, we report the per-image L2 distance to the ground truth and the FID of the resulting sample distribution (Figure 4). Across all three tasks, increasing the time devoted to annealed Langevin decreases L2 but increases FID; in the inpainting setting, when the step size is sufficiently small, our method surpasses DPS on both metrics. Qualitatively, our reconstructions better preserve ground-truth attributes compared to DPS (Figures 5 and 6).
Researcher Affiliation	Collaboration	Zhiyang Xun UT Austin EMAIL Shivam Gupta UT Austin EMAIL Eric Price UT Austin & Microsoft Research EMAIL
Pseudocode	Yes	Algorithm 1 Sampling from p(x \| Ax + N(0, η2Im) = y) ... Algorithm 2 Sampling from p(x \| x0, y) given an extra Gaussian measurement x0 ... Algorithm 3 Competitive Compressed Sensing Algorithm Given a Rough Estimation
Open Source Code	Yes	Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide open access to the code. The datasets we use are open access.
Open Datasets	Yes	To validate our theoretical analysis and assess real-world performance, we study three inverse problems on FFHQ 256 [KLA21]: inpainting, 4 super-resolution, and Gaussian deblurring. ... The datasets we use are open access.
Dataset Splits	Yes	Experiments use 1k validation images and the pre-trained diffusion model from [CKM+23].
Hardware Specification	Yes	All experiments were run on a cluster with four NVIDIA A100 GPUs and required roughly two hours per task.
Software Dependencies	No	The paper does not explicitly list specific software versions (e.g., Python, PyTorch version numbers) used for implementation. It mentions a 'pre-trained diffusion model from [CKM+23]' but this refers to a model/paper, not a specific software dependency with a version number.
Experiment Setup	Yes	Forward operators are specified as in [CKM+23]: inpainting masks 30% 70% of pixels uniformly at random; super-resolution downsamples by a factor of 4; deblurring convolves the ground-truth with a Gaussian kernel of size 61 61 (std. 3.0). ... To control runtime, we sweep the step size while keeping the annealing schedule fixed.