Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Surprising properties of dropout in deep networks
Authors: David P. Helmbold, Philip M. Long
JMLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To complement our theoretical results we performed two sets of experiments. The ο¬rst set tests the scale dependence of dropout and weight decay, while the the second set examines its promotion of negative weights even when learning monotone functions. The code is accessible at https://www.dropbox.com/sh/6s2lcfrq17zshmp/AAAQ06u Da4g OAu Anw2MAgh EMa?dl=0 |
| Researcher Affiliation | Collaboration | David P. Helmbold EMAIL Department of Computer Science University of California, Santa Cruz Santa Cruz, CA 95064, USA Philip M. Long EMAIL Google 1600 Amphitheatre Parkway Mountain View, CA 94043, USA |
| Pseudocode | No | The paper describes methods, theorems, lemmas, and proofs in prose and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is accessible at https://www.dropbox.com/sh/6s2lcfrq17zshmp/AAAQ06u Da4g OAu Anw2MAgh EMa?dl=0 |
| Open Datasets | No | The paper describes generating training examples uniformly at random from [ 1, 1]K, or defines a small custom training set with specific inputs and labels. It does not provide concrete access information (link, DOI, citation) to a publicly available or open dataset. |
| Dataset Splits | No | The paper describes generating or defining small custom training sets (e.g., 'Ten training examples were generated', 'The training set consists of six inputs'). It does not provide specific details on training/test/validation splits, percentages, or references to predefined splits for reproduction. |
| Hardware Specification | No | The paper mentions that simulations were 'implemented using Torch' and 'with Keras on top of Tensor Flow', but it does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions software frameworks like Torch, Keras, and Tensor Flow, and the optim package for Torch, but it does not specify any version numbers for these software components. |
| Experiment Setup | Yes | We used stochastic gradient using the optim package for Torch, with learning rate 0.01 1+0.00001t and momentum of 0.5, and a maximum of 100000 iterations. We used the standard architecture with K = 5 inputs, depth d = 2, n = 5 hidden nodes. WD was trained with dropout probability 1/2 and no weight decay. W2 was trained with weight decay with Ξ» = 1/2 and no dropout. Wnone was trained without any regularization. These experiments were implemented with Keras on top of Tensor Flow and using SGD optimization with a learning rate of 0.005. Weight decay learning used a parameter of 0.05, and dropout training used a dropout rate of 0.5. We used the standard architecture with six inputs, 12 hidden nodes, and one output. We ran each of dropout and weight-decay for 10,000 epochs. |