Auto-Encoding Variational Bayes
Authors: Diederik P. Kingma; Max Welling
ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We trained generative models of images from the MNIST and Frey Face datasets3 and compared learning algorithms in terms of the variational lower bound, and the estimated marginal likelihood. |
| Researcher Affiliation | Academia | Diederik P. Kingma Machine Learning Group Universiteit van Amsterdam dpkingma@gmail.com Max Welling Machine Learning Group Universiteit van Amsterdam welling.max@gmail.com |
| Pseudocode | Yes | Algorithm 1 Minibatch version of the Auto-Encoding VB (AEVB) algorithm. |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | We trained generative models of images from the MNIST and Frey Face datasets3 and compared learning algorithms in terms of the variational lower bound, and the estimated marginal likelihood. [...] 3Available at http://www.cs.nyu.edu/ roweis/data.html |
| Dataset Splits | No | The paper mentions training and test sets but does not specify a separate validation split or how it was used for hyperparameter tuning. 'Stepsizes were adapted with Adagrad [DHS10]; the Adagrad global stepsize parameters were chosen from {0.01, 0.02, 0.1} based on performance on the training set in the first few iterations.' |
| Hardware Specification | Yes | Computation took around 20-40 minutes per million training samples with a Intel Xeon CPU running at an effective 40 GFLOPS. |
| Software Dependencies | No | The paper mentions optimization methods like SGD and Adagrad but does not specify software libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Parameters are updated using stochastic gradient ascent where gradients are computed by differentiating the lower bound estimator θ,φL(θ, φ; X) (see algorithm 1), plus a small weight decay term corresponding to a prior p(θ) = N(0, I). [...] Stepsizes were adapted with Adagrad [DHS10]; the Adagrad global stepsize parameters were chosen from {0.01, 0.02, 0.1} based on performance on the training set in the first few iterations. Minibatches of size M = 100 were used, with L = 1 samples per datapoint. |