reproducibilityindex.ai

Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

Authors: Kruno Lehman, Alain Durmus, Umut Simsekli

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we illustrate our theory on various experiments conducted on synthetic data and neural networks.
Researcher Affiliation	Academia	Krunoslav Lehman Pavasovic Inria Paris, CNRS, Ecole Normale Supérieure, PSL Research University Paris, France krunoslav.lehman-pavasovic@inria.fr Alain Durmus CMAP, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris Paris, France alain.durmus@polytechnique.edu Umut Sim sekli Inria Paris, CNRS, Ecole Normale Supérieure, PSL Research University Paris, France umut.simsekli@inria.fr
Pseudocode	No	The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The code scripts for reproducing the experimental results can be accessed at github.com/krunolp/offline_ht.
Open Datasets	Yes	To further illustrate this observation, as a preliminary exploration, we run offline SGD in a 100-dimensional linear regression problem, as well as a classification problem on the MNIST dataset, using a fully-connected, 3-layer neural network. ... The models are trained for 10, 000 iterations using cross-entropy loss on the MNIST and CIFAR-10 datasets.
Dataset Splits	No	The paper mentions using subsets of training data (25%, 50%, 75%) but does not specify a clear train/validation/test split for reproducibility, nor does it explicitly mention a validation set.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup	Yes	We vary the learning rate from 10 4 to 10 1, and the batch size b from 1 to 10, with offline SGD utilizing 25%, 50%, and 75% of the training data.