A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Authors: Umut Simsekli, Levent Sagun, Mert Gurbuzbalaban
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the αstable assumption, we conduct experiments on common deep learning scenarios and show that in all settings, the GN is highly non-Gaussian and admits heavy-tails. We investigate the tail behavior in varying network architectures and sizes, loss functions, and datasets. |
| Researcher Affiliation | Academia | 1LTCI, T elecom Paris Tech, Universit e Paris-Saclay, 75013, Paris, France 2Institute of Physics, Ecole Polytechnique F ed erale de Lausanne, 1015 Lausanne, Switzerland 3Department of Management Science and Information Systems, Rutgers Business School, NJ 08854, USA. |
| Pseudocode | No | No pseudocode or algorithm block is explicitly labeled or provided. |
| Open Source Code | Yes | The codebase is implemented in python using pytorch and provided in https://github.com/umutsimsekli/sgd_tail_index. |
| Open Datasets | Yes | We randomly split the MNIST dataset into train and test parts of sizes 60K and 10K, and CIFAR10 and CIFAR100 datasets into train and test parts of sizes 50K and 10K, respectively. |
| Dataset Splits | No | We randomly split the MNIST dataset into train and test parts of sizes 60K and 10K, and CIFAR10 and CIFAR100 datasets into train and test parts of sizes 50K and 10K, respectively. |
| Hardware Specification | No | Total runtime is 3 weeks on 8 relatively modern GPUs. |
| Software Dependencies | No | The codebase is implemented in python using pytorch and provided in https://github.com/umutsimsekli/sgd_tail_index. |
| Experiment Setup | Yes | Unless stated otherwise, we set the minibatch size b = 500 and the step-size η = 0.1. For both fully connected and convolutional settings, we run each configuration with the negative-log-likelihood (i.e. cross entropy) and with the linear hinge loss, and we repeat each experiment with three different random seeds. |