Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Metropolis Adjusted Microcanonical Hamiltonian Monte Carlo

Authors: Jakob Robnik, Reuben Cohn-Gordon, Uros Seljak

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that MAMS outperforms NUTS across the board on benchmark problems of varying complexity and dimensionality, achieving up to a factor of seven speedup. We test MAMS on standard benchmarks in Section 7 and find that it outperforms the state-of-the-art HMC with NUTS tuning by a factor of two at worst, and seven at best.
Researcher Affiliation	Academia	Jakob Robnik Physics Department, University of California at Berkeley, Berkeley, CA 94720, USA EMAIL Reuben Cohn-Gordon Physics Department, University of California at Berkeley, Berkeley, CA 94720, USA EMAIL Uroš Seljak Physics Department, University of California at Berkeley and Lawrence Berkeley National Laboratory, Berkeley, Berkeley, CA 94720, USA EMAIL
Pseudocode	Yes	Algorithm 1: MAMS Langevin
Open Source Code	Yes	The algorithm is implemented in blackjax (Cabezas et al., 2024), applicable out-of-the-box, and is publicly available, together with documentation and tutorials1. The code for reproducing numerical experiments is also available2. 1https://blackjax-devs.github.io/sampling-book/algorithms/mclmc.html 2https://github.com/reubenharry/sampler-benchmarks
Open Datasets	Yes	Benchmarks Table 1 compares MAMS with NUTS and MALA on a set of benchmark problems, mostly adapted from the Inference Gym (Sountsov et al., 2020).
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits. It describes the problem setups for various benchmarks and states: 'To reduce variance in these results, we run at least 128 chains for each problem and take the median of the error across chains at each step.' This refers to sampling chains rather than dataset partitioning.
Hardware Specification	Yes	The experiments were run on 128 CPU cores, where each core is a 2x AMD EPYC 7763 (Milan) CPU.
Software Dependencies	No	The paper mentions 'blackjax (Cabezas et al., 2024)' for NUTS and 'numpyro (Phan et al., 2019b)' for the Stochastic Volatility problem. However, specific version numbers for these software dependencies are not provided.
Experiment Setup	Yes	MAMS has two hyperparameters, stepsize ϵ and the trajectory length L, where L/ϵ is the (average) number of steps in a proposal s trajectory. The Langevin version of the algorithm has an additional hyperparameter Lpartial that determines the partial refreshment strength during the proposal trajectories, i.e., the amount of Langevin noise. We set the acceptance rate to 90%. In the first stage of tuning, we use a stochastic optimization scheme, dual averaging (Nesterov, 2009) from (Hoffman and Gelman, 2014), to adapt the step size until a desired acceptance rate is achieved. We determine the proportionality constant of Equation (12) in a way that L equals the optimal L, determined by a grid search, for the standard Gaussian. We find a proportionality constant of 0.3 for MAMS without Langevin noise and 0.23 with Langevin noise. We will use the same setting for Langevin MAMS, so Lpartial/L = 1.25.