Toward Understanding the Importance of Noise in Training Neural Networks

Authors: Mo Zhou, Tianyi Liu, Yan Li, Dachao Lin, Enlu Zhou, Tuo Zhao

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments are provided to support our theory.
Researcher Affiliation Academia 1Peking University 2Georgia Institute of Technology.
Pseudocode Yes Algorithm 1 Perturbed Gradient Descent Algorithm with Noise Annealing
Open Source Code No The paper does not provide an explicit statement about the release of source code or a link to a code repository.
Open Datasets No The paper describes the generation of synthetic data ('training data is generated from a teacher network', 'independent Gaussian input'), but it does not specify a publicly available or open dataset that can be accessed via a link, DOI, or standard citation.
Dataset Splits No The paper mentions 'training' but does not provide specific details about training/validation/test splits or a separate validation set.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies or version numbers (e.g., programming languages, libraries, frameworks with versions).
Experiment Setup Yes For the perturbed GD algorithm, we perform step size and noise annealing in an epoch-wise fashion: each simulation has 20 epochs with each epoch consisting of 400 iterations; The initial learning rate is 0.1 for both w and a, and geometrically decays with a ratio 0.8; The initial noise levels are given by (ρw, ρa) = (36, 1) and both geometrically decay with a ratio 0.4. For GD, the learning rate is 0.1 for both w and a. For SGD, we adopt a batch size of 4, and perform step size annealing in an epoch-wise fashion: The initial learning rate is 0.1, and geometrically decays with a ratio 0.4.