reproducibilityindex.ai

Variational Dropout and the Local Reparameterization Trick

Authors: Durk P. Kingma, Tim Salimans, Max Welling

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The method is demonstrated through several experiments. ... 5 Experiments We compare our method to standard binary dropout and two popular versions of Gaussian dropout, which we ll denote with type A and type B.
Researcher Affiliation	Collaboration	Diederik P. Kingma , Tim Salimans and Max Welling Machine Learning Group, University of Amsterdam Algoritmica University of California, Irvine, and the Canadian Institute for Advanced Research (CIFAR) D.P.Kingma@uva.nl, salimans.tim@gmail.com, M.Welling@uva.nl ... Diederik Kingma is supported by the Google European Fellowship in Deep Learning, Max Welling is supported by research grants from Google and Facebook, and the NWO project in Natural AI (NAI.14.108).
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of the methodology's code.
Open Datasets	Yes	A de facto standard benchmark for regularization methods is the task of MNIST hand-written digit classiﬁcation.
Dataset Splits	Yes	We used early stopping with all methods, where the amount of epochs to run was determined based on performance on a validation set.
Hardware Specification	No	With our implementation on a modern GPU, optimization with the na ıve estimator took 1635 seconds per epoch, while the efﬁcient estimator took 7.4 seconds: an over 200 fold speedup.
Software Dependencies	No	Models were implemented in Theano [5], and optimization was performed using Adam [12] with default hyper-parameters and temporal averaging.
Experiment Setup	Yes	Models were implemented in Theano [5], and optimization was performed using Adam [12] with default hyper-parameters and temporal averaging. ... We follow the dropout hyper-parameter recommendations from these earlier publications, which is a dropout rate of p = 0.5 for the hidden layers and p = 0.2 for the input layer.