Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Reverse Diffusion Sequential Monte Carlo Samplers

Authors: Luhuan Wu, Yi Han, Christian Andersson Naesseth, John P. Cunningham

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our method on a range of synthetic targets and real-world Bayesian inference problems. 2. 5 Experiments. We evaluate RDSMC on a range of synthetic and real-world target distributions, comparing it to SMC [26], AIS [25], SMS [33], RDMC [8], and SLIPS [9].
Researcher Affiliation	Academia	Luhuan Wu Columbia University Yi Han Columbia University Christian A. Naesseth University of Amsterdam John P. Cunningham Columbia University
Pseudocode	Yes	Algorithm 1: Reverse Diffusion Sequential Monte Carlo (RDSMC) Input: Unnormalized target π(x0), number of particles N, discretization steps T (with step size δ = 1/T), diffusion schedule {αt, σt, ft, gt}, base distribution q(x T ), and additional inputs for score and marginal estimation η Output: Weighted samples {x(i) 0 , w(i) 0 }N i=1 and normalization constant estimate ˆZ
Open Source Code	Yes	Our code is available at https://github.com/Luhuan Wu/RDSMC.
Open Datasets	Yes	5.1 Bi-modal gaussian mixtures. 5.2 Rings and Funnel distributions. 5.3 Bayesian logistic regression. Finally, we evaluate inference performance on Bayesian logistic regression models using four datasets. Credit and Cancer [35] involve predicting credit risk and breast cancer recurrence, while Ionosphere [36] and Sonar [37] focus on classifying radar and sonar signals, respectively.
Dataset Splits	Yes	The inference is performed on 60% of each dataset, leaving 20% for validation and 20% for testing.
Hardware Specification	Yes	Experiments are conducted on an NVIDIA RTX A6000 GPU, whereas our previous experiments use an NVIDIA A100 GPU.
Software Dependencies	No	PDDS is implemented in JAX (following the official implementation at https://github. com/angusphillips/particle_denoising_diffusion_sampler# ), while RDSMC and the other baselines are implemented in Py Torch. The paper does not provide specific version numbers for JAX or PyTorch.
Experiment Setup	Yes	Unless otherwise specified, we use T = 100 discretization steps for RDSMC and its variants, and T = 1, 024 steps for other methods. We generate N = 4, 096 final samples for all methods, and tune their hyperparameters assuming access to either a validation dataset or an oracle metric. For each method we select the hyperparameters based on the lowest estimation bias of the weight ratio. For each method, we select the hyperparameters with lowest Raidus TVD on a heldout validation set for Rings and lowest Sliced KSD for Funnel. Each method is tuned using the validation log-likelihood estimate.