A Gradient Based Strategy for Hamiltonian Monte Carlo Hyperparameter Optimization

Authors: Andrew Campbell, Wenlong Chen, Vincent Stimper, Jose Miguel Hernandez-Lobato, Yichuan Zhang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our proposed method and compare to baselines on a variety of problems including sampling from synthetic 2D distributions, reconstructing sparse signals, learning deep latent variable models and sampling molecular confgurations from the Boltzmann distribution of a 22 atom molecule. We fnd that our method is competitive with or improves upon alternative baselines in all these experiments.
Researcher Affiliation Collaboration 1Department of Statistics, University of Oxford 2Baidu, Inc. 3Department of Engineering, University of Cambridge 4Max Planck Institute for Intelligent Systems 5Boltzbit Ltd.
Pseudocode Yes Algorithm 1 summarizes our optimization strategy, where Adam update(η, rηL, i) returns the new value for η given by the i-th iteration of the Adam optimizer using gra (0) dient rηL, Dα(q (x) || p(x)) is an estimate of the ψi 1 α-divergence whose gradient is computed using doubly reparameterized gradient estimators (Tucker et al., 2019), SKSD(x1: ) , score(x)) estimates the sliced kernelized Stein (0)0 discrepancy and HMCφi 1 (xn , score(x)) runs an HMC (0)0 chain with initial state xn , target score function score(x) and hyperparameters φi 1. Details on the computation (0) (T ) of Dα(q (x) || p(x)), i) and SKSD(x1:N , score(x)) are ψi 1 given in the Supplementary Material.
Open Source Code Yes We provide code for reproducing all our experiments on github 2. https://github.com/VincentStimper/ hmc-hyperparameter-tuning
Open Datasets Yes We consider two benchmark datasets: MNIST and Fashion MNIST.
Dataset Splits No The paper mentions using a 'validation set' for tuning in the sparse signal recovery experiments ('to maximize the log marginal likelihood on a validation set'), but does not provide specific details on how this set was derived (e.g., percentages, sample counts, or explicit standard split citation) for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We use 30 step HMC chains and a factorized Gaussian q(0)(x) that is trained by minimizing the α-divergence. We optimize step sizes and masses using (1), a procedure we refer to as max ELT for maximizing the expected log target. Additionally, as an ablation study, we consider different initial distribution training strategies: α = 0 or 1 and whether or not to tune the scaling s by minimizing the SKSD (s = 1 when not tuned). and We consider HMC chains of length 30 with 5 leapfrog iterations per step and tune a different step size parameter per dimension and per step in the HMC chain while the mass parameters are all kept constant and equal to 1 all throughout the HMC chain.