Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

Authors: Melih Barsbey, Milad Sefidgaran, Murat A. Erdogdu, Gaël Richard, Umut Simsekli

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present experiments conducted with neural networks to investigate our theory.
Researcher Affiliation Academia Melih Barsbey Bo gaziçi University melih.barsbey@boun.edu.tr Milad Sefidgaran LTCI, Télécom Paris, Institut Polytechnique de Paris milad.sefidgaran@telecom-paris.fr Murat A. Erdogdu University of Toronto & Vector Institute erdogdu@cs.toronto.edu Gaël Richard LTCI, Télécom Paris, Institut Polytechnique de Paris gael.richard@telecom-paris.fr Umut Sim sekli INRIA & ENS PSL Research University umut.simsekli@inria.fr
Pseudocode No The paper describes algorithms like SGD and pruning techniques using textual descriptions and mathematical equations, but does not include any structured pseudocode blocks or figures explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes We provided these [code, data, and instructions needed to reproduce the main experimental results] in the supplemetary material. The code base of this paper was made accessible through supplementary material.
Open Datasets Yes Each model is trained on MNIST [LCB10] and CIFAR10 [Kri09] datasets under various hyperparameter settings, using the default splits for training and evaluation. We use publicly available and non-personal data only, namely MNIST and CIFAR10 datasets.
Dataset Splits No The paper states 'using the default splits for training and evaluation' but does not specify explicit percentages or sample counts for validation or any specific split methodology for validation data.
Hardware Specification No The paper does not explicitly describe the specific hardware used for its experiments (e.g., GPU models, CPU specifications, or cloud resources). It states 'We provided these [total amount of compute and the type of resources used] in the supplementary material' but this information is not available in the provided text.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies. While PyTorch is cited, its version used is not mentioned.
Experiment Setup Yes The training hyperparameter settings include two batch-sizes (b = 50, 100) and various learning rates (η) to generate a large range of η/b values. All models were trained with SGD until convergence with constant learning rates and no momentum. The convergence criteria comprised 100% training accuracy and a training negative log-likelihood less than 5 * 10^-5.