Stochastic Gradient Hamiltonian Monte Carlo

Authors: Tianqi Chen, Emily Fox, Carlos Guestrin

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A number of simulated experiments validate our theoretical results and demonstrate the differences between (i) exact HMC, (ii) the na ıve implementation of stochastic gradient HMC (simply replacing the gradient with a stochastic gradient), and (iii) our proposed method incorporating friction. We also compare to the first-order Langevin dynamics of SGLD. Finally, we apply our proposed methods to a classification task using Bayesian neural networks and to online Bayesian matrix factorization of a standard movie dataset. Our experimental results demonstrate the effectiveness of the proposed algorithm.
Researcher Affiliation Academia Tianqi Chen TQCHEN@CS.WASHINGTON.EDU Emily B. Fox EBFOX@STAT.WASHINGTON.EDU Carlos Guestrin GUESTRIN@CS.WASHINGTON.EDU MODE Lab, University of Washington, Seattle, WA.
Pseudocode Yes Algorithm 1: Hamiltonian Monte Carlo
Open Source Code No The paper does not provide an unambiguous statement about releasing source code for the methodology described in this paper, nor does it provide a specific repository link.
Open Datasets Yes We also test our method on a handwritten digits classification task using the MNIST dataset. and We conduct an experiment in online Bayesian PMF on the Movielens dataset ml-1M. with footnote 5: http://grouplens.org/datasets/movielens/
Dataset Splits Yes We randomly split a validation set containing 10,000 instances from the training data in order to select training parameters, and use the remaining 50,000 instances for training.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup Yes For the sampling-based methods, we take a fully Bayesian approach and place a weakly informative gamma prior on each layer s weight regularizer λ. The sampling procedure is carried out by running SGHMC and SGLD using minibatches of 500 training instances, then resampling hyperparameters after an entire pass over the training set. We run the samplers for 800 iterations (each over the entire training dataset) and discard the initial 50 samples as burn-in.