Generalizing Hamiltonian Monte Carlo with Neural Networks
Authors: Daniel Levy, Matt D. Hoffman, Jascha Sohl-Dickstein
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate large empirical gains on a collection of simple but challenging distributions, for instance achieving a 106 improvement in effective sample size in one case, and mixing when standard HMC makes no measurable progress in a second. Finally, we show quantitative and qualitative gains on a real-world task: latent-variable generative modeling. |
| Researcher Affiliation | Collaboration | Daniel Levy1 , Matthew D. Hoffman2, Jascha Sohl-Dickstein3 1Stanford University, 2Google AI Perception , 3Google Brain danilevy@cs.stanford.edu, {mhoffman,jaschasd}@google.com |
| Pseudocode | Yes | Algorithm 1 Training L2HMC; Algorithm 2 L2HMC for latent variable generative models |
| Open Source Code | Yes | We release an open source Tensor Flow implementation of the algorithm. Code implementing our algorithm is available online1. 1https://github.com/brain-research/l2hmc. |
| Open Datasets | Yes | All experiments were done on the dynamically binarized MNIST dataset (Le Cun). |
| Dataset Splits | No | The paper uses the MNIST dataset but does not specify explicit training, validation, and test splits (e.g., percentages or sample counts). It refers to "training and held-out data" but without numerical details for the split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It does not mention any particular processor, GPU, or cloud instance type. |
| Software Dependencies | No | The paper mentions "Tensor Flow" but does not specify a version number or list other software dependencies with their respective versions. |
| Experiment Setup | Yes | Our decoder (pφ) is a neural network with 2 fully connected layers, with 1024 units each and softplus non-linearities, and outputs Bernoulli activation probabilities for each pixel. The encoder (qψ) has the same architecture, returning mean and variance for the approximate posterior. Our model was trained for 300 epochs with Adam (Kingma & Ba, 2014) and a learning rate α = 10 3. We train with Adam (Kingma & Ba, 2014) and a learning rate α = 10 3. We train for 5, 000 iterations with a batch size of 200. |