Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections
Authors: Alexander Camuto, Xiaoyu Wang, Lingjiong Zhu, Chris Holmes, Mert Gurbuzbalaban, Umut Simsekli
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results confirm that different types of neural networks trained with GNIs are well-modelled by the proposed dynamics and that the implicit effect of these injections induces a bias that degrades the performance of networks. |
| Researcher Affiliation | Academia | 1Alan Turing Institute, University of Oxford, Oxford, UK 2Department of Mathematics, Florida State University, Tallahassee, USA 3Department of Management Science and Information Systems, Rutgers Business School, Piscataway, USA 4 INRIA D epartement d Informatique de l Ecole Normale Sup erieure PSL Research University, Paris, France. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | See https://github.com/alexander-camuto/ asym-heavy-tails-bias-GNI for all code. |
| Open Datasets | Yes | We train the networks in Figure 2 on e LM (4.5) for the sinusoidal toy-data with additive GNIs [left] and multiplicative GNIs [right]. and Figure 7. We show the test-set loss for SVHN [bottom] and CIFAR10 [top]... and Figure 5. We train the networks in Figure 2 on e LM (4.5) for the sinusoidal toy-data with additive GNIs [left] and MNIST with multiplicative GNIs [right]. |
| Dataset Splits | Yes | Convolutional networks trained with R consistently outperform those trained with GNIs and mini-batching on held-out data, supporting that the implicit effect degrades performance. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. It describes the experimental setup in terms of models, datasets, and training parameters. |
| Software Dependencies | No | The paper mentions “scikit-learn” and other tools within the references, but it does not specify version numbers for any key software components or libraries required to reproduce the experiments. |
| Experiment Setup | Yes | Here, we consider a one-dimensional problem with the quartic potential f(w) = w4/4 w2/2, and simulate (4.4) for 10K iterations with constant step-size k = 0.001 and " = 1. and Shading is the standard deviation over 5 random seeds. and We use multiplicative noise of variance σ2 and batch size of 512. |