On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent
Authors: Scott Pesme, Aymeric Dieuleveut, Nicolas Flammarion
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate our theoretical results with synthetic and real examples. We provide additional experiments in Appendix A.2. |
| Researcher Affiliation | Academia | Scott Pesme 1 Aymeric Dieuleveut 2 Nicolas Flammarion 1 1 Theory of Machine Learning lab, EPFL 2 Ecole Polytechnique. Correspondence to: Scott Pesme <scott.pesme@epfl.ch>. |
| Pseudocode | Yes | Algorithm 1 Convergence-Diagnostic algorithm |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of their code for the described methodology. |
| Open Datasets | Yes | Res Net18. We train an 18-layer Res Net model (He et al., 2016) on the CIFAR-10 dataset (Krizhevsky, 2009) using SGD with a momentum of 0.9, weight decay of 0.0001 and batch size of 128. ... We further investigate the performance of the distance-based diagnostic on real-world datasets: the Covertype dataset and the MNIST dataset1. (Footnote 1: Covertype dataset available at archive.ics.uci.edu/ml/datasets/covertype and MNIST at yann.lecun.com/exdb/mnist.) |
| Dataset Splits | No | Each dataset is divided in two equal parts, one for training and one for testing. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Pytorch s Reduce LROn Plateau() scheduler' but does not specify version numbers for PyTorch or any other software components. |
| Experiment Setup | Yes | Res Net18. We train an 18-layer Res Net model (He et al., 2016) on the CIFAR-10 dataset (Krizhevsky, 2009) using SGD with a momentum of 0.9, weight decay of 0.0001 and batch size of 128. To adapt the distance-based step-size statistic to this scenario, we use Pytorch s Reduce LROn Plateau() scheduler... The parameters of the scheduler are set to: patience = 1000, threshold = 0.01... All initial step sizes are set to 0.1... The initial step size for our distance-based algorithm was set to 4/R2. |