Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers

Authors: Robin M Schmidt, Frank Schneider, Philipp Hennig

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To do so, we perform an extensive, standardized benchmark of ffteen particularly popular deep learning optimizers while giving a concise overview of the wide range of possible choices. Analyzing more than 50,000 individual runs, we contribute the following three points: (i) Optimizer performance varies greatly across tasks.We conduct a large-scale benchmark of optimizers to ground the ongoing debate about deep learning optimizers on em pirical evidence, and to help understand how the choice of optimization methods and hyperparameters infuences the training performance.
Researcher Affiliation Academia 1Methods of Machine Learning, Univer sity of Tübingen, Tübingen, Germany 2Max Planck Institute for Intelligent Systems, Tübingen, Germany.
Pseudocode No The paper describes optimization methods conceptually but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Our open-sourced results1 are available as challeng ing and well-tuned baselines for more meaningful evaluations of novel optimization methods with out requiring any further computational efforts.1https://github.com/Sir Rob1997/ Crowded-Valley---Results
Open Datasets Yes We consider the eight optimization tasks summarized in Table 1, available as the small (P1 P4) and large (P5 P8) problem sets in DEEPOBS. (Table 1 lists MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, SVHN).
Dataset Splits Yes DEEPOBS provides several performance metrics, includ ing the training and test loss, and the validation accuracy. ... Accordingly, the tuning (Section 2.3) is done with respect to the validation metric.
Hardware Specification Yes All approximations are for ADAM on a Tesla K80 GPU.
Software Dependencies Yes All experiments were performed using version 1.2.0-beta of DEEPOBS and Tensor Flow version 1.15 (Abadi et al., 2015).
Experiment Setup Yes For each problem and optimizer we evaluate all possible combina tions of four different tuning budgets (Section 2.3) and four selected learning rate schedules (Section 2.4), covering the following combinatorial space:The frst budget consists of just a single run. This oneshot budget uses the default values proposed by the original authors, where available (Table 4 in the appendix lists the default parameters).We choose four different learning rate schedules, trying to cover all major types of schedules (see Appendix E): A constant learning rate; A cosine decay (Loshchilov & Hutter, 2017) as an example of a smooth decay; A cosine with warm restarts schedule (Loshchilov & Hutter, 2017) as a cyclical schedule; A trapezoidal schedule (Xing et al., 2018) from the warm-up schedules introduced in Goyal et al. (2017).To keep the benchmark feasible, we chose to use the fxed L2 regularization and batch size that DEEPOBS suggests for each problem.