reproducibilityindex.ai

Generalizing Hamiltonian Monte Carlo with Neural Networks

Authors: Daniel Levy, Matt D. Hoffman, Jascha Sohl-Dickstein

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate large empirical gains on a collection of simple but challenging distributions, for instance achieving a 106 improvement in effective sample size in one case, and mixing when standard HMC makes no measurable progress in a second. Finally, we show quantitative and qualitative gains on a real-world task: latent-variable generative modeling.
Researcher Affiliation	Collaboration	Daniel Levy1 , Matthew D. Hoffman2, Jascha Sohl-Dickstein3 1Stanford University, 2Google AI Perception , 3Google Brain danilevy@cs.stanford.edu, {mhoffman,jaschasd}@google.com
Pseudocode	Yes	Algorithm 1 Training L2HMC; Algorithm 2 L2HMC for latent variable generative models
Open Source Code	Yes	We release an open source Tensor Flow implementation of the algorithm. Code implementing our algorithm is available online1. 1https://github.com/brain-research/l2hmc.
Open Datasets	Yes	All experiments were done on the dynamically binarized MNIST dataset (Le Cun).
Dataset Splits	No	The paper uses the MNIST dataset but does not specify explicit training, validation, and test splits (e.g., percentages or sample counts). It refers to "training and held-out data" but without numerical details for the split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It does not mention any particular processor, GPU, or cloud instance type.
Software Dependencies	No	The paper mentions "Tensor Flow" but does not specify a version number or list other software dependencies with their respective versions.
Experiment Setup	Yes	Our decoder (pφ) is a neural network with 2 fully connected layers, with 1024 units each and softplus non-linearities, and outputs Bernoulli activation probabilities for each pixel. The encoder (qψ) has the same architecture, returning mean and variance for the approximate posterior. Our model was trained for 300 epochs with Adam (Kingma & Ba, 2014) and a learning rate α = 10 3. We train with Adam (Kingma & Ba, 2014) and a learning rate α = 10 3. We train for 5, 000 iterations with a batch size of 200.