Auto-Encoding Variational Bayes

Authors: Diederik P. Kingma; Max Welling

ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We trained generative models of images from the MNIST and Frey Face datasets3 and compared learning algorithms in terms of the variational lower bound, and the estimated marginal likelihood.
Researcher Affiliation Academia Diederik P. Kingma Machine Learning Group Universiteit van Amsterdam dpkingma@gmail.com Max Welling Machine Learning Group Universiteit van Amsterdam welling.max@gmail.com
Pseudocode Yes Algorithm 1 Minibatch version of the Auto-Encoding VB (AEVB) algorithm.
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets Yes We trained generative models of images from the MNIST and Frey Face datasets3 and compared learning algorithms in terms of the variational lower bound, and the estimated marginal likelihood. [...] 3Available at http://www.cs.nyu.edu/ roweis/data.html
Dataset Splits No The paper mentions training and test sets but does not specify a separate validation split or how it was used for hyperparameter tuning. 'Stepsizes were adapted with Adagrad [DHS10]; the Adagrad global stepsize parameters were chosen from {0.01, 0.02, 0.1} based on performance on the training set in the first few iterations.'
Hardware Specification Yes Computation took around 20-40 minutes per million training samples with a Intel Xeon CPU running at an effective 40 GFLOPS.
Software Dependencies No The paper mentions optimization methods like SGD and Adagrad but does not specify software libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Parameters are updated using stochastic gradient ascent where gradients are computed by differentiating the lower bound estimator θ,φL(θ, φ; X) (see algorithm 1), plus a small weight decay term corresponding to a prior p(θ) = N(0, I). [...] Stepsizes were adapted with Adagrad [DHS10]; the Adagrad global stepsize parameters were chosen from {0.01, 0.02, 0.1} based on performance on the training set in the first few iterations. Minibatches of size M = 100 were used, with L = 1 samples per datapoint.