Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Metropolis Adjusted Microcanonical Hamiltonian Monte Carlo

Authors: Jakob Robnik, Reuben Cohn-Gordon, Uros Seljak

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that MAMS outperforms NUTS across the board on benchmark problems of varying complexity and dimensionality, achieving up to a factor of seven speedup. We test MAMS on standard benchmarks in Section 7 and find that it outperforms the state-of-the-art HMC with NUTS tuning by a factor of two at worst, and seven at best.
Researcher Affiliation Academia Jakob Robnik Physics Department, University of California at Berkeley, Berkeley, CA 94720, USA EMAIL Reuben Cohn-Gordon Physics Department, University of California at Berkeley, Berkeley, CA 94720, USA EMAIL Uroลก Seljak Physics Department, University of California at Berkeley and Lawrence Berkeley National Laboratory, Berkeley, Berkeley, CA 94720, USA EMAIL
Pseudocode Yes Algorithm 1: MAMS Langevin
Open Source Code Yes The algorithm is implemented in blackjax (Cabezas et al., 2024), applicable out-of-the-box, and is publicly available, together with documentation and tutorials1. The code for reproducing numerical experiments is also available2. 1https://blackjax-devs.github.io/sampling-book/algorithms/mclmc.html 2https://github.com/reubenharry/sampler-benchmarks
Open Datasets Yes Benchmarks Table 1 compares MAMS with NUTS and MALA on a set of benchmark problems, mostly adapted from the Inference Gym (Sountsov et al., 2020).
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits. It describes the problem setups for various benchmarks and states: 'To reduce variance in these results, we run at least 128 chains for each problem and take the median of the error across chains at each step.' This refers to sampling chains rather than dataset partitioning.
Hardware Specification Yes The experiments were run on 128 CPU cores, where each core is a 2x AMD EPYC 7763 (Milan) CPU.
Software Dependencies No The paper mentions 'blackjax (Cabezas et al., 2024)' for NUTS and 'numpyro (Phan et al., 2019b)' for the Stochastic Volatility problem. However, specific version numbers for these software dependencies are not provided.
Experiment Setup Yes MAMS has two hyperparameters, stepsize ฯต and the trajectory length L, where L/ฯต is the (average) number of steps in a proposal s trajectory. The Langevin version of the algorithm has an additional hyperparameter Lpartial that determines the partial refreshment strength during the proposal trajectories, i.e., the amount of Langevin noise. We set the acceptance rate to 90%. In the first stage of tuning, we use a stochastic optimization scheme, dual averaging (Nesterov, 2009) from (Hoffman and Gelman, 2014), to adapt the step size until a desired acceptance rate is achieved. We determine the proportionality constant of Equation (12) in a way that L equals the optimal L, determined by a grid search, for the standard Gaussian. We find a proportionality constant of 0.3 for MAMS without Langevin noise and 0.23 with Langevin noise. We will use the same setting for Langevin MAMS, so Lpartial/L = 1.25.