Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors

Authors: Christos Louizos, Max Welling

ICML 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The validity of the proposed approach is veriﬁed through extensive experiments. ... 4. Experiments All of the models were coded in Theano (Bergstra et al., 2010) and optimization was done with Adam (Kingma & Ba, 2015), using the default hyper-parameters and temporal averaging.
Researcher Affiliation	Academia	Christos Louizos EMAIL AMLAB, Informatics Institute, University of Amsterdam Max Welling EMAIL AMLAB, Informatics Institute, University of Amsterdam Canadian Institute for Advanced Research (CIFAR)
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the described methodology.
Open Datasets	Yes	For the regression task we experimented with the UCI (Asuncion & Newman, 2007) datasets that were used in Probabilistic Backpropagation (PBP) (Hern andez Lobato & Adams, 2015) and in Dropout as a Bayesian Approximation (Gal & Ghahramani, 2015). For the classiﬁcation task we evaluated our model on the permutation invariant MNIST benchmark dataset
Dataset Splits	Yes	For the regression experiments we followed a similar experimental protocol with (Hern andez-Lobato & Adams, 2015): we randomly keep 90% of the dataset for training and use the remaining to test the performance. ... For the classiﬁcation experiments ... We used the last 10000 samples of the training set as a validation set for model selection
Hardware Specification	No	The paper mentions that models were 'coded in Theano' but does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for experiments.
Software Dependencies	No	All of the models were coded in Theano (Bergstra et al., 2010) and optimization was done with Adam (Kingma & Ba, 2015), using the default hyper-parameters and temporal averaging.
Experiment Setup	Yes	All of the models were coded in Theano (Bergstra et al., 2010) and optimization was done with Adam (Kingma & Ba, 2015), using the default hyper-parameters and temporal averaging. We parametrized the prior for each weight matrix as p(W) = MN(0, I, I) unless stated otherwise. ... We used rectiﬁed linear units (Re LU) and we initialized the mean of each matrix variate Gaussian via the scheme proposed in (He et al., 2015). For the initialization of the pseudo-data we sampled the entries of A, B from U[ 0.01, 0.01]. We used one posterior sample to estimate the expected log-likelihood before we update the parameters. ... We use one hidden layer of 50 units for all of the datasets, except for the larger Protein and Year datasets where we use 100 units. ... Similarly to (Gal & Ghahramani, 2015) we set the upper bound of the variational dropout rate to 0.005, 0.05 and we used 10 pseudo-data pairs for each layer for all of the datasets, except for the smaller Yacht dataset where we used 5 and the bigger Protein and Year where we used 20. ... minibatches of 100 datapoints and set the upper bound for the variational dropout rate to 0.25. We used the same amount of pseudo-data pairs for each layer, but tuned those according to the validation set performance (we set an upper bound of 150 pseudo-data pairs per layer).