Generalizing Hamiltonian Monte Carlo with Neural Networks

Authors: Daniel Levy, Matt D. Hoffman, Jascha Sohl-Dickstein

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate large empirical gains on a collection of simple but challenging distributions, for instance achieving a 106 improvement in effective sample size in one case, and mixing when standard HMC makes no measurable progress in a second. Finally, we show quantitative and qualitative gains on a real-world task: latent-variable generative modeling.
Researcher Affiliation Collaboration Daniel Levy1 , Matthew D. Hoffman2, Jascha Sohl-Dickstein3 1Stanford University, 2Google AI Perception , 3Google Brain danilevy@cs.stanford.edu, {mhoffman,jaschasd}@google.com
Pseudocode Yes Algorithm 1 Training L2HMC; Algorithm 2 L2HMC for latent variable generative models
Open Source Code Yes We release an open source Tensor Flow implementation of the algorithm. Code implementing our algorithm is available online1. 1https://github.com/brain-research/l2hmc.
Open Datasets Yes All experiments were done on the dynamically binarized MNIST dataset (Le Cun).
Dataset Splits No The paper uses the MNIST dataset but does not specify explicit training, validation, and test splits (e.g., percentages or sample counts). It refers to "training and held-out data" but without numerical details for the split.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It does not mention any particular processor, GPU, or cloud instance type.
Software Dependencies No The paper mentions "Tensor Flow" but does not specify a version number or list other software dependencies with their respective versions.
Experiment Setup Yes Our decoder (pφ) is a neural network with 2 fully connected layers, with 1024 units each and softplus non-linearities, and outputs Bernoulli activation probabilities for each pixel. The encoder (qψ) has the same architecture, returning mean and variance for the approximate posterior. Our model was trained for 300 epochs with Adam (Kingma & Ba, 2014) and a learning rate α = 10 3. We train with Adam (Kingma & Ba, 2014) and a learning rate α = 10 3. We train for 5, 000 iterations with a batch size of 200.