Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

Authors: Kruno Lehman, Alain Durmus, Umut Simsekli

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we illustrate our theory on various experiments conducted on synthetic data and neural networks.
Researcher Affiliation Academia Krunoslav Lehman Pavasovic Inria Paris, CNRS, Ecole Normale Supérieure, PSL Research University Paris, France krunoslav.lehman-pavasovic@inria.fr Alain Durmus CMAP, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris Paris, France alain.durmus@polytechnique.edu Umut Sim sekli Inria Paris, CNRS, Ecole Normale Supérieure, PSL Research University Paris, France umut.simsekli@inria.fr
Pseudocode No The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The code scripts for reproducing the experimental results can be accessed at github.com/krunolp/offline_ht.
Open Datasets Yes To further illustrate this observation, as a preliminary exploration, we run offline SGD in a 100-dimensional linear regression problem, as well as a classification problem on the MNIST dataset, using a fully-connected, 3-layer neural network. ... The models are trained for 10, 000 iterations using cross-entropy loss on the MNIST and CIFAR-10 datasets.
Dataset Splits No The paper mentions using subsets of training data (25%, 50%, 75%) but does not specify a clear train/validation/test split for reproducibility, nor does it explicitly mention a validation set.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes We vary the learning rate from 10 4 to 10 1, and the batch size b from 1 to 10, with offline SGD utilizing 25%, 50%, and 75% of the training data.