Toward Understanding the Importance of Noise in Training Neural Networks
Authors: Mo Zhou, Tianyi Liu, Yan Li, Dachao Lin, Enlu Zhou, Tuo Zhao
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments are provided to support our theory. |
| Researcher Affiliation | Academia | 1Peking University 2Georgia Institute of Technology. |
| Pseudocode | Yes | Algorithm 1 Perturbed Gradient Descent Algorithm with Noise Annealing |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code or a link to a code repository. |
| Open Datasets | No | The paper describes the generation of synthetic data ('training data is generated from a teacher network', 'independent Gaussian input'), but it does not specify a publicly available or open dataset that can be accessed via a link, DOI, or standard citation. |
| Dataset Splits | No | The paper mentions 'training' but does not provide specific details about training/validation/test splits or a separate validation set. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers (e.g., programming languages, libraries, frameworks with versions). |
| Experiment Setup | Yes | For the perturbed GD algorithm, we perform step size and noise annealing in an epoch-wise fashion: each simulation has 20 epochs with each epoch consisting of 400 iterations; The initial learning rate is 0.1 for both w and a, and geometrically decays with a ratio 0.8; The initial noise levels are given by (ρw, ρa) = (36, 1) and both geometrically decay with a ratio 0.4. For GD, the learning rate is 0.1 for both w and a. For SGD, we adopt a batch size of 4, and perform step size annealing in an epoch-wise fashion: The initial learning rate is 0.1, and geometrically decays with a ratio 0.4. |