Variational Dropout and the Local Reparameterization Trick
Authors: Durk P. Kingma, Tim Salimans, Max Welling
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The method is demonstrated through several experiments. ... 5 Experiments We compare our method to standard binary dropout and two popular versions of Gaussian dropout, which we ll denote with type A and type B. |
| Researcher Affiliation | Collaboration | Diederik P. Kingma , Tim Salimans and Max Welling Machine Learning Group, University of Amsterdam Algoritmica University of California, Irvine, and the Canadian Institute for Advanced Research (CIFAR) D.P.Kingma@uva.nl, salimans.tim@gmail.com, M.Welling@uva.nl ... Diederik Kingma is supported by the Google European Fellowship in Deep Learning, Max Welling is supported by research grants from Google and Facebook, and the NWO project in Natural AI (NAI.14.108). |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the methodology's code. |
| Open Datasets | Yes | A de facto standard benchmark for regularization methods is the task of MNIST hand-written digit classification. |
| Dataset Splits | Yes | We used early stopping with all methods, where the amount of epochs to run was determined based on performance on a validation set. |
| Hardware Specification | No | With our implementation on a modern GPU, optimization with the na ıve estimator took 1635 seconds per epoch, while the efficient estimator took 7.4 seconds: an over 200 fold speedup. |
| Software Dependencies | No | Models were implemented in Theano [5], and optimization was performed using Adam [12] with default hyper-parameters and temporal averaging. |
| Experiment Setup | Yes | Models were implemented in Theano [5], and optimization was performed using Adam [12] with default hyper-parameters and temporal averaging. ... We follow the dropout hyper-parameter recommendations from these earlier publications, which is a dropout rate of p = 0.5 for the hidden layers and p = 0.2 for the input layer. |