Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

Authors: Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	experiments with deep Bayesian neural networks on large-scale datasets have shown its signiﬁcant improvements over strong baselines.
Researcher Affiliation	Collaboration	1University College London, 2Huawei R&D U.K.
Pseudocode	Yes	Algorithm 1 The replica-exchange protocol; Algorithm 2 Replica-exchange Nosé-Hoover dynamics
Open Source Code	No	The text mentions a GitHub link in footnote 3 (https://github.com/BIDData/BIDMach) but states it is for 'pre-computed density3 and the conventional methods', referring to a baseline/comparison method, not the authors' own source code for RENHD.
Open Datasets	Yes	We run two tasks of image classiﬁcation on real datasets: Fashion-MNIST on a recurrent neural network and CIFAR-10 on a residual network (Res Net) [18]
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits with specific percentages or counts. It mentions 'Random permutation is applied to a percentage (0%, 20%, or 30%) of the training labels' but this refers to data augmentation/uncertainty, not standard data splitting.
Hardware Specification	Yes	It took 2.5 hours for the replica ensemble to ﬁnd a good mode on a single Titan Xp.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies (e.g., Python, deep learning frameworks, or libraries).
Experiment Setup	Yes	For all methods, a single run has 1000 epochs. Random permutation is applied to a percentage (0%, 20%, or 30%) of the training labels... We set the mini-batch size \|S\|nhd = 128 for the Nosé-Hoover dynamics and \|S\|re = 256 for the exchange protocol. The ladder is built with 푀= 12 rungs with geometric factor 휏= 1.2 such that the rate of exchange in the experiment is roughly 30% 40%. For the dynamic parameters, the additive Gaussian intensity 푐= 0.1 and the step size 휖= 5 10 6 in (20). To propose a new sample, the dynamics will simulate a trajectory of length 푁= 200.