Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Auto-Encoding Variational Bayes
Authors: Diederik P. Kingma; Max Welling
ICLR 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We trained generative models of images from the MNIST and Frey Face datasets3 and compared learning algorithms in terms of the variational lower bound, and the estimated marginal likelihood. |
| Researcher Affiliation | Academia | Diederik P. Kingma Machine Learning Group Universiteit van Amsterdam EMAIL Max Welling Machine Learning Group Universiteit van Amsterdam EMAIL |
| Pseudocode | Yes | Algorithm 1 Minibatch version of the Auto-Encoding VB (AEVB) algorithm. |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | We trained generative models of images from the MNIST and Frey Face datasets3 and compared learning algorithms in terms of the variational lower bound, and the estimated marginal likelihood. [...] 3Available at http://www.cs.nyu.edu/ roweis/data.html |
| Dataset Splits | No | The paper mentions training and test sets but does not specify a separate validation split or how it was used for hyperparameter tuning. 'Stepsizes were adapted with Adagrad [DHS10]; the Adagrad global stepsize parameters were chosen from {0.01, 0.02, 0.1} based on performance on the training set in the first few iterations.' |
| Hardware Specification | Yes | Computation took around 20-40 minutes per million training samples with a Intel Xeon CPU running at an effective 40 GFLOPS. |
| Software Dependencies | No | The paper mentions optimization methods like SGD and Adagrad but does not specify software libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Parameters are updated using stochastic gradient ascent where gradients are computed by differentiating the lower bound estimator θ,φL(θ, φ; X) (see algorithm 1), plus a small weight decay term corresponding to a prior p(θ) = N(0, I). [...] Stepsizes were adapted with Adagrad [DHS10]; the Adagrad global stepsize parameters were chosen from {0.01, 0.02, 0.1} based on performance on the training set in the first few iterations. Minibatches of size M = 100 were used, with L = 1 samples per datapoint. |