A Gradient Based Strategy for Hamiltonian Monte Carlo Hyperparameter Optimization
Authors: Andrew Campbell, Wenlong Chen, Vincent Stimper, Jose Miguel Hernandez-Lobato, Yichuan Zhang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed method and compare to baselines on a variety of problems including sampling from synthetic 2D distributions, reconstructing sparse signals, learning deep latent variable models and sampling molecular confgurations from the Boltzmann distribution of a 22 atom molecule. We fnd that our method is competitive with or improves upon alternative baselines in all these experiments. |
| Researcher Affiliation | Collaboration | 1Department of Statistics, University of Oxford 2Baidu, Inc. 3Department of Engineering, University of Cambridge 4Max Planck Institute for Intelligent Systems 5Boltzbit Ltd. |
| Pseudocode | Yes | Algorithm 1 summarizes our optimization strategy, where Adam update(η, rηL, i) returns the new value for η given by the i-th iteration of the Adam optimizer using gra (0) dient rηL, Dα(q (x) || p(x)) is an estimate of the ψi 1 α-divergence whose gradient is computed using doubly reparameterized gradient estimators (Tucker et al., 2019), SKSD(x1: ) , score(x)) estimates the sliced kernelized Stein (0)0 discrepancy and HMCφi 1 (xn , score(x)) runs an HMC (0)0 chain with initial state xn , target score function score(x) and hyperparameters φi 1. Details on the computation (0) (T ) of Dα(q (x) || p(x)), i) and SKSD(x1:N , score(x)) are ψi 1 given in the Supplementary Material. |
| Open Source Code | Yes | We provide code for reproducing all our experiments on github 2. https://github.com/VincentStimper/ hmc-hyperparameter-tuning |
| Open Datasets | Yes | We consider two benchmark datasets: MNIST and Fashion MNIST. |
| Dataset Splits | No | The paper mentions using a 'validation set' for tuning in the sparse signal recovery experiments ('to maximize the log marginal likelihood on a validation set'), but does not provide specific details on how this set was derived (e.g., percentages, sample counts, or explicit standard split citation) for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We use 30 step HMC chains and a factorized Gaussian q(0)(x) that is trained by minimizing the α-divergence. We optimize step sizes and masses using (1), a procedure we refer to as max ELT for maximizing the expected log target. Additionally, as an ablation study, we consider different initial distribution training strategies: α = 0 or 1 and whether or not to tune the scaling s by minimizing the SKSD (s = 1 when not tuned). and We consider HMC chains of length 30 with 5 leapfrog iterations per step and tune a different step size parameter per dimension and per step in the HMC chain while the mass parameters are all kept constant and equal to 1 all throughout the HMC chain. |