Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

Authors: Kruno Lehman, Alain Durmus, Umut Simsekli

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we illustrate our theory on various experiments conducted on synthetic data and neural networks.
Researcher Affiliation Academia Krunoslav Lehman Pavasovic Inria Paris, CNRS, Ecole Normale Supérieure, PSL Research University Paris, France EMAIL Alain Durmus CMAP, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris Paris, France EMAIL Umut Sim sekli Inria Paris, CNRS, Ecole Normale Supérieure, PSL Research University Paris, France EMAIL
Pseudocode No The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The code scripts for reproducing the experimental results can be accessed at github.com/krunolp/offline_ht.
Open Datasets Yes To further illustrate this observation, as a preliminary exploration, we run offline SGD in a 100-dimensional linear regression problem, as well as a classification problem on the MNIST dataset, using a fully-connected, 3-layer neural network. ... The models are trained for 10, 000 iterations using cross-entropy loss on the MNIST and CIFAR-10 datasets.
Dataset Splits No The paper mentions using subsets of training data (25%, 50%, 75%) but does not specify a clear train/validation/test split for reproducibility, nor does it explicitly mention a validation set.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes We vary the learning rate from 10 4 to 10 1, and the batch size b from 1 to 10, with offline SGD utilizing 25%, 50%, and 75% of the training data.