Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Enhanced gradient-based MCMC in discrete spaces

Authors: Benjamin Rhodes, Michael U. Gutmann

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the newly proposed methods NCG, AVG & PAVG on four problem types: 1) sampling from highly correlated ordinal mixture distributions 2) a sparse Bayesian variable selection problem 3) estimation of Ising models and 4) sampling a deep energy-based model parameterised by a convolutional neural network. Our key baselines are Gibbs-with-Gradients (GWG) Grathwohl et al. (2021) and a standard Gibbs sampler (Geman & Geman, 1984).
Researcher Affiliation	Academia	Benjamin Rhodes EMAIL University of Edinburgh Michael U. Gutmann EMAIL University of Edinburgh
Pseudocode	Yes	Algorithm B.1 NCG step; Algorithm B.2 AVG step; Algorithm B.3 PAVG step; Algorithm F.4 Adaptive learning of preconditioning matrix. (Default values in brackets are used across all experiments); Algorithm F.5 Adapt γ to maximise jump distance st st 1 1; Algorithm L.6 Persistent contrastive divergence with buffer
Open Source Code	No	This vectorised Py Torch code will be accessible upon publication.
Open Datasets	Yes	We apply this methodology to the USPS 256-dimensional image dataset of binarised handwritten digits (Hull, 1994)
Dataset Splits	No	The paper does not provide specific dataset splits like train/test/validation percentages or counts for the input data. It describes how MCMC chains are run (e.g., "Run 100 parallel chains for 10 minutes with a burn-in period of 1 minute") and the size of a generated dataset (e.g., "The dataset D consists of 10,000 samples"), but these are not splits of an original dataset for model training/evaluation.
Hardware Specification	No	The paper mentions "efficient GPU acceleration" in Appendix B but does not provide specific details on the GPU models or any other hardware specifications used for the experiments.
Software Dependencies	Yes	tfp.mcmc.effective_sample_size(S, filter_beyond_positive_pairs=True) using version 0.14.1 of tensorflow-probability. We use version 0.9.8 of the igraph package.
Experiment Setup	Yes	Our grid-search based tuning procedure involves running each sampler for a short amount of time (1000 iterations, which takes 1 minute in most of our experiments) with different step-sizes, and selecting the step-size that maximises the average L1-distance st+1 st 1 between successive states (averaged over all time-steps and parallel chains). For NCG, AVG & PAVG we first first identify the best order-of-magnitude by searching, in parallel, over the 5 values in the set {0.05, 0.5, 5.0, 50.0, 500.0}. We set Niters = 2,000, Nbatch = 50, Nbuffer = 5000, ϵ = 0.0003. We use weight decay of 0.0001 on the neural net weights.