Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Parallelizing MCMC Across the Sequence Length

Authors: David Zoltowski, Skyler Wu, Xavier Gonzalez, Leo Kozachkov, Scott Linderman

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In several examples, we demonstrate the simulation of up to hundreds of thousands of MCMC samples with only tens of parallel Newton iterations. Additionally, we develop two new parallel quasi-Newton methods to evaluate nonlinear recursions with lower memory costs and reduced runtime. We find that the proposed parallel algorithms accelerate MCMC sampling across multiple examples, in some cases by more than an order of magnitude compared to sequential evaluation.
Researcher Affiliation	Collaboration	David M. Zoltowski Stanford University EMAIL Skyler Wu Stanford University EMAIL Xavier Gonzalez Stanford University EMAIL Leo Kozachkov Brown University EMAIL Scott W. Linderman Stanford University EMAIL [...] L.K. was a Goldstine Fellow at IBM Research while working on this paper.
Pseudocode	Yes	A Additional Algorithm Details Algorithm 1 Parallel evaluation of nonlinear sequence models using variants of DEER Algorithm 2 HMC Step with Parallel Leapfrog
Open Source Code	Yes	Our implementation is available at https://github.com/lindermanlab/parallel-mcmc.
Open Datasets	Yes	We first demonstrate parallelization of a reparameterized Gibbs sampler for the eight schools problem [3, 59] [...] We next evaluated parallel MALA, focusing on (1) the convergence rate of Newton s method and the wall-clock time relative to sequential MALA, and (2) its efficiency in generating useful samples. For parallel MALA, we used the stochastic quasi-DEER algorithm with a 1-sample estimate of the diagonal. We targeted the posterior of a Bayesian logistic regression (BLR) model of the German Credit Dataset with whitened covariates [60, 61] and a N(0, I) prior. [...] We encoded reviews from the IMDB dataset [65] into 768-dimensional embeddings using gemini-embedding-001 [66], partially inspired by Harrison et al. [67].
Dataset Splits	No	The paper mentions using 'synthetic data from this model' for the eight schools problem and '1024 randomly selected reviews' from the IMDB dataset, but does not provide explicit training/test/validation splits for any dataset used in the experiments.
Hardware Specification	Yes	Unless otherwise noted, all runs used a single H100 GPU on a SLURM cluster. [...] We investigated the convergence and wall-clock speed of DEER and efficient quasi-DEER on an A100 GPU [...] All experiments were run on an NVIDIA H100 GPU with 80 GB of RAM.
Software Dependencies	No	Our implementations were in JAX [58] with wall-clock times measured post-JIT compilation. [...] implemented in Tensor Flow Probability [81]. [...] efficiency using effective sample size (ESS) per second using Arvi Z [63].
Experiment Setup	Yes	For both parallel and sequential MALA, we use a learning rate of ϵ = 0.0015 [...] For parallel MALA, we set a max_iter of 50 + 5L 10^-4 [...] We set the absolute tolerance to 5 10^-4. [...] we swept across the number of leapfrog steps L = {4, 8, 12, 16, 20, 24, 32, 40, 64, 96} and step sizes ϵ = {0.005, 0.0075, 0.01, 0.02, 0.03, 0.04, 0.05, 0.075, 0.1}. [...] We simulated B = 4 chains with 4096 samples using sequential and parallel MALA with a step size of ϵ = 0.015. To handle the larger dimensionality, we applied the sliding window method from Section 3.4 (considering window sizes of 128, 256, 512, and 1024)