A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Authors: Yarin Gal, Zoubin Ghahramani
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply this new variational inference based dropout technique in LSTM and GRU models, assessing it on language modelling and sentiment analysis tasks. The new approach outperforms existing techniques, and to the best of our knowledge improves on the single model state-of-the-art in language modelling with the Penn Treebank (73.4 test perplexity). This extends our arsenal of variational tools in deep learning. |
| Researcher Affiliation | Academia | University of Cambridge {yg279,zg201}@cam.ac.uk |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | No | The paper does not explicitly state that its source code is publicly available or provide a link to it. |
| Open Datasets | Yes | We replicate the language modelling experiment of Zaremba, Sutskever, and Vinyals [4]. The experiment uses the Penn Treebank, a standard benchmark in the field. |
| Dataset Splits | Yes | We replicate the language modelling experiment of Zaremba, Sutskever, and Vinyals [4]. |
| Hardware Specification | Yes | Assessing model run time though (on a Titan X GPU) |
| Software Dependencies | No | The paper mentions 'Torch implementation' but does not specify its version or the versions of other key software dependencies used for the experiments. |
| Experiment Setup | Yes | All other hyper-parameters are kept identical to [4]: learning rate decay was not tuned for our setting and is used following [4]. Dropout parameters were optimised with grid search (tying the dropout probability over the embeddings together with the one over the recurrent layers, and tying the dropout probability for the inputs and outputs together as well). These are chosen to minimise validation perplexity3. Optimal probabilities are 0.3 and 0.5 respectively for the large model, compared [4] s 0.6 dropout probability, and 0.2 and 0.35 respectively for the medium model, compared [4] s 0.5 dropout probability. |