Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent
Authors: Kruno Lehman, Alain Durmus, Umut Simsekli
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we illustrate our theory on various experiments conducted on synthetic data and neural networks. |
| Researcher Affiliation | Academia | Krunoslav Lehman Pavasovic Inria Paris, CNRS, Ecole Normale Supérieure, PSL Research University Paris, France EMAIL Alain Durmus CMAP, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris Paris, France EMAIL Umut Sim sekli Inria Paris, CNRS, Ecole Normale Supérieure, PSL Research University Paris, France EMAIL |
| Pseudocode | No | The paper does not contain pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The code scripts for reproducing the experimental results can be accessed at github.com/krunolp/offline_ht. |
| Open Datasets | Yes | To further illustrate this observation, as a preliminary exploration, we run offline SGD in a 100-dimensional linear regression problem, as well as a classification problem on the MNIST dataset, using a fully-connected, 3-layer neural network. ... The models are trained for 10, 000 iterations using cross-entropy loss on the MNIST and CIFAR-10 datasets. |
| Dataset Splits | No | The paper mentions using subsets of training data (25%, 50%, 75%) but does not specify a clear train/validation/test split for reproducibility, nor does it explicitly mention a validation set. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We vary the learning rate from 10 4 to 10 1, and the batch size b from 1 to 10, with offline SGD utilizing 25%, 50%, and 75% of the training data. |