Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks
Authors: Melih Barsbey, Milad Sefidgaran, Murat A. Erdogdu, Gaël Richard, Umut Simsekli
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present experiments conducted with neural networks to investigate our theory. |
| Researcher Affiliation | Academia | Melih Barsbey Bo gaziçi University melih.barsbey@boun.edu.tr Milad Sefidgaran LTCI, Télécom Paris, Institut Polytechnique de Paris milad.sefidgaran@telecom-paris.fr Murat A. Erdogdu University of Toronto & Vector Institute erdogdu@cs.toronto.edu Gaël Richard LTCI, Télécom Paris, Institut Polytechnique de Paris gael.richard@telecom-paris.fr Umut Sim sekli INRIA & ENS PSL Research University umut.simsekli@inria.fr |
| Pseudocode | No | The paper describes algorithms like SGD and pruning techniques using textual descriptions and mathematical equations, but does not include any structured pseudocode blocks or figures explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | We provided these [code, data, and instructions needed to reproduce the main experimental results] in the supplemetary material. The code base of this paper was made accessible through supplementary material. |
| Open Datasets | Yes | Each model is trained on MNIST [LCB10] and CIFAR10 [Kri09] datasets under various hyperparameter settings, using the default splits for training and evaluation. We use publicly available and non-personal data only, namely MNIST and CIFAR10 datasets. |
| Dataset Splits | No | The paper states 'using the default splits for training and evaluation' but does not specify explicit percentages or sample counts for validation or any specific split methodology for validation data. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for its experiments (e.g., GPU models, CPU specifications, or cloud resources). It states 'We provided these [total amount of compute and the type of resources used] in the supplementary material' but this information is not available in the provided text. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies. While PyTorch is cited, its version used is not mentioned. |
| Experiment Setup | Yes | The training hyperparameter settings include two batch-sizes (b = 50, 100) and various learning rates (η) to generate a large range of η/b values. All models were trained with SGD until convergence with constant learning rates and no momentum. The convergence criteria comprised 100% training accuracy and a training negative log-likelihood less than 5 * 10^-5. |