On the Implicit Bias of Dropout

Authors: Poorya Mianjy, Raman Arora, Rene Vidal

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We specialize our results to matrix factorization in Section 5, and in Section 6, we discuss preliminary experiments to support our theoretical results.
Researcher Affiliation Academia 1Department of Computer Science, Johns Hopkins University, Baltimore, USA 2Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA.
Pseudocode Yes Algorithm 1 Dropout with Stochastic Gradient Descent Algorithm 2 EQZ(U) equalizer of an auto-encoder h U,U Algorithm 3 Polynomial time solver for Problem 7
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The input features are sampled for a standard normal distribution. The input x R80 is distributed according to the standard Normal distribution. The output y R120 is generated as y = Mx, where M R120 80 is drawn randomly by uniformly sampling the right and left singular subspaces and with a spectrum decaying exponentially. The paper describes how data was generated or sampled for the experiments but does not use a publicly available dataset with concrete access information.
Dataset Splits No The paper does not specify exact split percentages or sample counts for training, validation, or test sets. It describes the generation of data but not how it was partitioned for different phases of experimentation.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x, CUDA x.x) required to replicate the experiments.
Experiment Setup Yes Figure 3 illustrates the behavior of Algorithm 1 for different values of the regularization parameter (λ {0.1, 0.5, 1}), and for different sizes of factors (r {20, 80}). The curve in blue shows the objective value for the iterates of dropout, and the line in red shows the optimal value of the objective (i.e. objective for a global optimum found using Theorem 3.6). All plots are averaged over 50 runs of Algorithm 1 (averaged over different random initializations, random realizations of Bernoulli dropout, as well as random draws of training examples).