reproducibilityindex.ai

Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

Authors: Melih Barsbey, Milad Sefidgaran, Murat A. Erdogdu, Gaël Richard, Umut Simsekli

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present experiments conducted with neural networks to investigate our theory.
Researcher Affiliation	Academia	Melih Barsbey Bo gaziçi University melih.barsbey@boun.edu.tr Milad Seﬁdgaran LTCI, Télécom Paris, Institut Polytechnique de Paris milad.sefidgaran@telecom-paris.fr Murat A. Erdogdu University of Toronto & Vector Institute erdogdu@cs.toronto.edu Gaël Richard LTCI, Télécom Paris, Institut Polytechnique de Paris gael.richard@telecom-paris.fr Umut Sim sekli INRIA & ENS PSL Research University umut.simsekli@inria.fr
Pseudocode	No	The paper describes algorithms like SGD and pruning techniques using textual descriptions and mathematical equations, but does not include any structured pseudocode blocks or figures explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	We provided these [code, data, and instructions needed to reproduce the main experimental results] in the supplemetary material. The code base of this paper was made accessible through supplementary material.
Open Datasets	Yes	Each model is trained on MNIST [LCB10] and CIFAR10 [Kri09] datasets under various hyperparameter settings, using the default splits for training and evaluation. We use publicly available and non-personal data only, namely MNIST and CIFAR10 datasets.
Dataset Splits	No	The paper states 'using the default splits for training and evaluation' but does not specify explicit percentages or sample counts for validation or any specific split methodology for validation data.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for its experiments (e.g., GPU models, CPU specifications, or cloud resources). It states 'We provided these [total amount of compute and the type of resources used] in the supplementary material' but this information is not available in the provided text.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies. While PyTorch is cited, its version used is not mentioned.
Experiment Setup	Yes	The training hyperparameter settings include two batch-sizes (b = 50, 100) and various learning rates (η) to generate a large range of η/b values. All models were trained with SGD until convergence with constant learning rates and no momentum. The convergence criteria comprised 100% training accuracy and a training negative log-likelihood less than 5 * 10^-5.