On Convergence and Generalization of Dropout Training
Authors: Poorya Mianjy, Raman Arora
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 6, we present a sketch of the proofs of our main results the detailed proofs are deferred to the Appendix. We conclude the paper by providing empirical evidence for our theoretical results in Section 6. The goal of this section is to investigate if dropout indeed compresses the model, as predicted by Theorem 4.2. We train a convolutional neural network with a dropout layer on the top hidden layer, using cross-entropy loss, on the MNIST dataset. |
| Researcher Affiliation | Academia | Poorya Mianjy Department of Computer Science Johns Hopkins University mianjy@jhu.edu Raman Arora Department of Computer Science Johns Hopkins University arora@cs.jhu.edu |
| Pseudocode | Yes | Algorithm 1 Dropout in Two-Layer Networks Input: data ST = {(xt, yt)}T t=1 DT ; Bernoulli masks BT = {Bt}T t=1; dropout rate 1 q; max-norm constraint parameter c; learning rate η 1: initialize: wr,1 N(0, I) and ar Unif({+1, 1}), r [m] 2: for t = 1, . . . , T 1 do 3: forward: g(Wt; xt, Bt) = 1 ma Btσ(Wtxt) 4: backward: Lt(Wt) = ℓ(ytg(Wt; xt, Bt) = ℓ (ytg(Wt; xt, Bt)) yt g(Wt; xt, Bt) 5: update: Wt+ 1 2 Wt η Lt(Wt) 6: max-norm: Wt+1 Πc(Wt+ 1 2 ) 7: end for Test Time: re-scale the weights as Wt q Wt |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We train a convolutional neural network with a dropout layer on the top hidden layer, using cross-entropy loss, on the MNIST dataset. |
| Dataset Splits | No | The paper mentions using the MNIST dataset but does not explicitly describe the training, validation, or test split percentages or methodology beyond stating it tracks 'test accuracy'. |
| Hardware Specification | No | The paper does not specify any particular hardware (CPU, GPU models, or cloud computing instances with their specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch' as a machine learning framework in a footnote, but does not provide any specific version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | We use a constant learning rate η = 0.01 and batch-size equal to 64 for all the experiments. We train several networks where except for the top layer widths (m {100, 500, 1K, 5K, 10K, 50K, 100K, 250K}), all other architectural parameters are fixed. We run the experiments for several values of the dropout rate, 1 p {0.1, 0.2, 0.3, . . . , 0.9}. |