Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

Authors: Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental experiments with deep Bayesian neural networks on large-scale datasets have shown its significant improvements over strong baselines.
Researcher Affiliation Collaboration 1University College London, 2Huawei R&D U.K.
Pseudocode Yes Algorithm 1 The replica-exchange protocol; Algorithm 2 Replica-exchange Nosé-Hoover dynamics
Open Source Code No The text mentions a GitHub link in footnote 3 (https://github.com/BIDData/BIDMach) but states it is for 'pre-computed density3 and the conventional methods', referring to a baseline/comparison method, not the authors' own source code for RENHD.
Open Datasets Yes We run two tasks of image classification on real datasets: Fashion-MNIST on a recurrent neural network and CIFAR-10 on a residual network (Res Net) [18]
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits with specific percentages or counts. It mentions 'Random permutation is applied to a percentage (0%, 20%, or 30%) of the training labels' but this refers to data augmentation/uncertainty, not standard data splitting.
Hardware Specification Yes It took 2.5 hours for the replica ensemble to find a good mode on a single Titan Xp.
Software Dependencies No The paper does not provide specific version numbers for software dependencies (e.g., Python, deep learning frameworks, or libraries).
Experiment Setup Yes For all methods, a single run has 1000 epochs. Random permutation is applied to a percentage (0%, 20%, or 30%) of the training labels... We set the mini-batch size |S|nhd = 128 for the Nosé-Hoover dynamics and |S|re = 256 for the exchange protocol. The ladder is built with 푀= 12 rungs with geometric factor 휏= 1.2 such that the rate of exchange in the experiment is roughly 30% 40%. For the dynamic parameters, the additive Gaussian intensity 푐= 0.1 and the step size 휖= 5 10 6 in (20). To propose a new sample, the dynamics will simulate a trajectory of length 푁= 200.