Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections

Authors: Alexander Camuto, Xiaoyu Wang, Lingjiong Zhu, Chris Holmes, Mert Gurbuzbalaban, Umut Simsekli

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results confirm that different types of neural networks trained with GNIs are well-modelled by the proposed dynamics and that the implicit effect of these injections induces a bias that degrades the performance of networks.
Researcher Affiliation Academia 1Alan Turing Institute, University of Oxford, Oxford, UK 2Department of Mathematics, Florida State University, Tallahassee, USA 3Department of Management Science and Information Systems, Rutgers Business School, Piscataway, USA 4 INRIA D epartement d Informatique de l Ecole Normale Sup erieure PSL Research University, Paris, France.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes See https://github.com/alexander-camuto/ asym-heavy-tails-bias-GNI for all code.
Open Datasets Yes We train the networks in Figure 2 on e LM (4.5) for the sinusoidal toy-data with additive GNIs [left] and multiplicative GNIs [right]. and Figure 7. We show the test-set loss for SVHN [bottom] and CIFAR10 [top]... and Figure 5. We train the networks in Figure 2 on e LM (4.5) for the sinusoidal toy-data with additive GNIs [left] and MNIST with multiplicative GNIs [right].
Dataset Splits Yes Convolutional networks trained with R consistently outperform those trained with GNIs and mini-batching on held-out data, supporting that the implicit effect degrades performance.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. It describes the experimental setup in terms of models, datasets, and training parameters.
Software Dependencies No The paper mentions “scikit-learn” and other tools within the references, but it does not specify version numbers for any key software components or libraries required to reproduce the experiments.
Experiment Setup Yes Here, we consider a one-dimensional problem with the quartic potential f(w) = w4/4 w2/2, and simulate (4.4) for 10K iterations with constant step-size k = 0.001 and " = 1. and Shading is the standard deviation over 5 random seeds. and We use multiplicative noise of variance σ2 and batch size of 512.