On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

Authors: Scott Pesme, Aymeric Dieuleveut, Nicolas Flammarion

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate our theoretical results with synthetic and real examples. We provide additional experiments in Appendix A.2.
Researcher Affiliation Academia Scott Pesme 1 Aymeric Dieuleveut 2 Nicolas Flammarion 1 1 Theory of Machine Learning lab, EPFL 2 Ecole Polytechnique. Correspondence to: Scott Pesme <scott.pesme@epfl.ch>.
Pseudocode Yes Algorithm 1 Convergence-Diagnostic algorithm
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of their code for the described methodology.
Open Datasets Yes Res Net18. We train an 18-layer Res Net model (He et al., 2016) on the CIFAR-10 dataset (Krizhevsky, 2009) using SGD with a momentum of 0.9, weight decay of 0.0001 and batch size of 128. ... We further investigate the performance of the distance-based diagnostic on real-world datasets: the Covertype dataset and the MNIST dataset1. (Footnote 1: Covertype dataset available at archive.ics.uci.edu/ml/datasets/covertype and MNIST at yann.lecun.com/exdb/mnist.)
Dataset Splits No Each dataset is divided in two equal parts, one for training and one for testing.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions using 'Pytorch s Reduce LROn Plateau() scheduler' but does not specify version numbers for PyTorch or any other software components.
Experiment Setup Yes Res Net18. We train an 18-layer Res Net model (He et al., 2016) on the CIFAR-10 dataset (Krizhevsky, 2009) using SGD with a momentum of 0.9, weight decay of 0.0001 and batch size of 128. To adapt the distance-based step-size statistic to this scenario, we use Pytorch s Reduce LROn Plateau() scheduler... The parameters of the scheduler are set to: patience = 1000, threshold = 0.01... All initial step sizes are set to 0.1... The initial step size for our distance-based algorithm was set to 4/R2.