Dropout: Explicit Forms and Capacity Control

Authors: Raman Arora, Peter Bartlett, Poorya Mianjy, Nathan Srebro

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide extensive numerical evaluations for validating our theory including verifying that the proposed theoretical bound on the Rademacher complexity is predictive of the observed generalization gap as well as highlighting how dropout breaks co-adaptation , a notion that was the main motivation behind the invention of dropout (Hinton et al., 2012).
Researcher Affiliation Academia 1Johns Hopkins University. 2University of California, Berkeley. 3TTI Chicago.
Pseudocode No The paper describes the methods and processes mathematically and textually, but it does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statements about releasing source code for the methodology or provide a link to a code repository.
Open Datasets Yes We evaluate dropout on the Movie Lens dataset (Harper & Konstan, 2016), a publicly available collaborative filtering dataset... We train 2-layer neural networks with and without dropout, on MNIST dataset of handwritten digits and Fashion MNIST dataset of Zalando s article images
Dataset Splits No The paper mentions training and test data, but does not specify details for a separate validation split, nor does it provide exact percentages or counts for how the datasets were partitioned for training, validation, and testing.
Hardware Specification No The paper does not mention any specific hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or operating systems).
Experiment Setup Yes We train the model for 100 epochs over the training data, where we use a fixed learning rate of lr = 1, and a batch size of 2000... We initialize the factors using the standard He initialization scheme (He et al., 2015)... The learning rate in all experiments is set to lr = 1e 3. We train the models for 30 epochs over the training set.