reproducibilityindex.ai

On Convergence and Generalization of Dropout Training

Authors: Poorya Mianjy, Raman Arora

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 6, we present a sketch of the proofs of our main results the detailed proofs are deferred to the Appendix. We conclude the paper by providing empirical evidence for our theoretical results in Section 6. The goal of this section is to investigate if dropout indeed compresses the model, as predicted by Theorem 4.2. We train a convolutional neural network with a dropout layer on the top hidden layer, using cross-entropy loss, on the MNIST dataset.
Researcher Affiliation	Academia	Poorya Mianjy Department of Computer Science Johns Hopkins University mianjy@jhu.edu Raman Arora Department of Computer Science Johns Hopkins University arora@cs.jhu.edu
Pseudocode	Yes	Algorithm 1 Dropout in Two-Layer Networks Input: data ST = {(xt, yt)}T t=1 DT ; Bernoulli masks BT = {Bt}T t=1; dropout rate 1 q; max-norm constraint parameter c; learning rate η 1: initialize: wr,1 N(0, I) and ar Unif({+1, 1}), r [m] 2: for t = 1, . . . , T 1 do 3: forward: g(Wt; xt, Bt) = 1 ma Btσ(Wtxt) 4: backward: Lt(Wt) = ℓ(ytg(Wt; xt, Bt) = ℓ (ytg(Wt; xt, Bt)) yt g(Wt; xt, Bt) 5: update: Wt+ 1 2 Wt η Lt(Wt) 6: max-norm: Wt+1 Πc(Wt+ 1 2 ) 7: end for Test Time: re-scale the weights as Wt q Wt
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We train a convolutional neural network with a dropout layer on the top hidden layer, using cross-entropy loss, on the MNIST dataset.
Dataset Splits	No	The paper mentions using the MNIST dataset but does not explicitly describe the training, validation, or test split percentages or methodology beyond stating it tracks 'test accuracy'.
Hardware Specification	No	The paper does not specify any particular hardware (CPU, GPU models, or cloud computing instances with their specifications) used for running the experiments.
Software Dependencies	No	The paper mentions 'Py Torch' as a machine learning framework in a footnote, but does not provide any specific version numbers for it or any other software dependencies.
Experiment Setup	Yes	We use a constant learning rate η = 0.01 and batch-size equal to 64 for all the experiments. We train several networks where except for the top layer widths (m {100, 500, 1K, 5K, 10K, 50K, 100K, 250K}), all other architectural parameters are ﬁxed. We run the experiments for several values of the dropout rate, 1 p {0.1, 0.2, 0.3, . . . , 0.9}.