The Implicit and Explicit Regularization Effects of Dropout
Authors: Colin Wei, Sham Kakade, Tengyu Ma
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work empirically shows that dropout provides both explicit and implicit regularization effects. Empirically, detailed experiments are provided in Section 5 showing that these simplified, analytical regularizers can faithfully match and replace dropout for both LSTM and Transformer architectures, on the Penn Treebank, Wikitext2, and Wikitext-103 datasets. |
| Researcher Affiliation | Collaboration | 1Stanford University. 2Microsoft Research & University of Washington. |
| Pseudocode | Yes | Algorithm 1 DROPOUTk, mini-batch dropout update using k samples of noise. Algorithm 2 The general form of update for combinations of our explicit and implicit regularizers. |
| Open Source Code | Yes | Our code is available at https://github.com/ cwein3/dropout-analytical. |
| Open Datasets | Yes | We work with Penn Treebank (Marcus et al., 1994), a corpus of 887,521 tokens and Wikitext-2 (Merity et al., 2016), a corpus of 2,088,628 tokens. |
| Dataset Splits | Yes | We plot the validation accuracy vs. training steps for models trained using DROPOUTk for various values of k. Figure 1 shows that, perhaps surprisingly, performance degrades quite sharply for larger choices of k. We work with Penn Treebank (Marcus et al., 1994)... and Wikitext-2 (Merity et al., 2016)... |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions basing models and code on prior works (Merity et al., 2017a; 2018) but does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | We fix the dropout probability to q 0.4 for these experiments. To compute the update gradients for our regularizers, we follow the general rule described in Algorithm 2. We specify additional hyperparameters in Section D. |