A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks

Authors: Umut Simsekli, Levent Sagun, Mert Gurbuzbalaban

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate the αstable assumption, we conduct experiments on common deep learning scenarios and show that in all settings, the GN is highly non-Gaussian and admits heavy-tails. We investigate the tail behavior in varying network architectures and sizes, loss functions, and datasets.
Researcher Affiliation Academia 1LTCI, T elecom Paris Tech, Universit e Paris-Saclay, 75013, Paris, France 2Institute of Physics, Ecole Polytechnique F ed erale de Lausanne, 1015 Lausanne, Switzerland 3Department of Management Science and Information Systems, Rutgers Business School, NJ 08854, USA.
Pseudocode No No pseudocode or algorithm block is explicitly labeled or provided.
Open Source Code Yes The codebase is implemented in python using pytorch and provided in https://github.com/umutsimsekli/sgd_tail_index.
Open Datasets Yes We randomly split the MNIST dataset into train and test parts of sizes 60K and 10K, and CIFAR10 and CIFAR100 datasets into train and test parts of sizes 50K and 10K, respectively.
Dataset Splits No We randomly split the MNIST dataset into train and test parts of sizes 60K and 10K, and CIFAR10 and CIFAR100 datasets into train and test parts of sizes 50K and 10K, respectively.
Hardware Specification No Total runtime is 3 weeks on 8 relatively modern GPUs.
Software Dependencies No The codebase is implemented in python using pytorch and provided in https://github.com/umutsimsekli/sgd_tail_index.
Experiment Setup Yes Unless stated otherwise, we set the minibatch size b = 500 and the step-size η = 0.1. For both fully connected and convolutional settings, we run each configuration with the negative-log-likelihood (i.e. cross entropy) and with the linear hinge loss, and we repeat each experiment with three different random seeds.