Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Parallelizing MCMC Across the Sequence Length
Authors: David Zoltowski, Skyler Wu, Xavier Gonzalez, Leo Kozachkov, Scott Linderman
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In several examples, we demonstrate the simulation of up to hundreds of thousands of MCMC samples with only tens of parallel Newton iterations. Additionally, we develop two new parallel quasi-Newton methods to evaluate nonlinear recursions with lower memory costs and reduced runtime. We find that the proposed parallel algorithms accelerate MCMC sampling across multiple examples, in some cases by more than an order of magnitude compared to sequential evaluation. |
| Researcher Affiliation | Collaboration | David M. Zoltowski Stanford University EMAIL Skyler Wu Stanford University EMAIL Xavier Gonzalez Stanford University EMAIL Leo Kozachkov Brown University EMAIL Scott W. Linderman Stanford University EMAIL [...] L.K. was a Goldstine Fellow at IBM Research while working on this paper. |
| Pseudocode | Yes | A Additional Algorithm Details Algorithm 1 Parallel evaluation of nonlinear sequence models using variants of DEER Algorithm 2 HMC Step with Parallel Leapfrog |
| Open Source Code | Yes | Our implementation is available at https://github.com/lindermanlab/parallel-mcmc. |
| Open Datasets | Yes | We first demonstrate parallelization of a reparameterized Gibbs sampler for the eight schools problem [3, 59] [...] We next evaluated parallel MALA, focusing on (1) the convergence rate of Newton s method and the wall-clock time relative to sequential MALA, and (2) its efficiency in generating useful samples. For parallel MALA, we used the stochastic quasi-DEER algorithm with a 1-sample estimate of the diagonal. We targeted the posterior of a Bayesian logistic regression (BLR) model of the German Credit Dataset with whitened covariates [60, 61] and a N(0, I) prior. [...] We encoded reviews from the IMDB dataset [65] into 768-dimensional embeddings using gemini-embedding-001 [66], partially inspired by Harrison et al. [67]. |
| Dataset Splits | No | The paper mentions using 'synthetic data from this model' for the eight schools problem and '1024 randomly selected reviews' from the IMDB dataset, but does not provide explicit training/test/validation splits for any dataset used in the experiments. |
| Hardware Specification | Yes | Unless otherwise noted, all runs used a single H100 GPU on a SLURM cluster. [...] We investigated the convergence and wall-clock speed of DEER and efficient quasi-DEER on an A100 GPU [...] All experiments were run on an NVIDIA H100 GPU with 80 GB of RAM. |
| Software Dependencies | No | Our implementations were in JAX [58] with wall-clock times measured post-JIT compilation. [...] implemented in Tensor Flow Probability [81]. [...] efficiency using effective sample size (ESS) per second using Arvi Z [63]. |
| Experiment Setup | Yes | For both parallel and sequential MALA, we use a learning rate of ϵ = 0.0015 [...] For parallel MALA, we set a max_iter of 50 + 5L 10^-4 [...] We set the absolute tolerance to 5 10^-4. [...] we swept across the number of leapfrog steps L = {4, 8, 12, 16, 20, 24, 32, 40, 64, 96} and step sizes ϵ = {0.005, 0.0075, 0.01, 0.02, 0.03, 0.04, 0.05, 0.075, 0.1}. [...] We simulated B = 4 chains with 4096 samples using sequential and parallel MALA with a step size of ϵ = 0.015. To handle the larger dimensionality, we applied the sliding window method from Section 3.4 (considering window sizes of 128, 256, 512, and 1024) |