DeepOBS: A Deep Learning Optimizer Benchmark Suite
Authors: Frank Schneider, Lukas Balles, Philipp Hennig
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As the primary contribution, we present DEEPOBS, a Python package of deep learning optimization benchmarks. The package addresses key challenges in the quantitative assessment of stochastic optimizers, and automates most steps of benchmarking. The library includes a wide and extensible set of ready-to-use realistic optimization problems, such as training Residual Networks for image classification on IMAGENET or character-level language prediction models, as well as popular classics like MNIST and CIFAR-10. The package also provides realistic baseline results for the most popular optimizers on these test problems, ensuring a fair comparison to the competition when benchmarking new optimizers, and without having to run costly experiments.In Section 4 we report on the performance of SGD, SGD with momentum (MOMENTUM) and ADAM on the small and large benchmarks (this also demonstrates the output of the benchmark). |
| Researcher Affiliation | Academia | Frank Schneider, Lukas Balles & Philipp Hennig University of T ubingen and Max Planck Institute for Intelligent Systems T ubingen, Germany {frank.schneider,lukas.balles, ph}@tue.mpg.de |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. |
| Open Source Code | Yes | As the primary contribution, we present DEEPOBS, a Python package of deep learning optimization benchmarks... It supports TENSORFLOW and is available open source. Code available at https://github.com/fsschneider/deepobs. |
| Open Datasets | Yes | The library includes a wide and extensible set of ready-to-use realistic optimization problems, such as training Residual Networks for image classification on IMAGENET or character-level language prediction models, as well as popular classics like MNIST and CIFAR-10. ... Table 1: Overview of the test problems included in DEEPOBS... Data set Model Description ... MNIST ... FASHION ... CIFAR-10 ... CIFAR-100 ... SVHN ... IMAGENET ... Tolstoi Char RNN |
| Dataset Splits | Yes | For hyperparameter tuning, we use test accuracy or, if that is not available, test loss, as the criteria. ... when we evaluate on the test set, we also evaluate on a larger chunk of training data, which we call a train eval set. ... The learning rate α was tuned for each optimizer and test problem individually, by evaluating on a logarithmic grid from αmin = 10 5 to αmax = 102 with 36 samples.In Appendix A, specific details for each test problem are provided, such as 'Trained with a batch size of 128 for 100 epochs.' (P3 FASHION-MNIST CNN). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments. It only vaguely refers to 'substantial computational effort' and 'can be done sequentially on the same hardware'. |
| Software Dependencies | Yes | We have distilled the above ideas into an open-source python package, written in TENSORFLOW (Abadi et al., 2015)... The experiments were done with version 1.1.0 of DEEPOBS. |
| Experiment Setup | Yes | For the baseline results provided with DEEPOBS, we evaluate three popular deep learning optimizers (SGD, MOMENTUM and ADAM) on the eight test problems... All experiments used 0.99 for the MOMENTUM parameter and default parameters for ADAM (β1 = 0.9, β2 = 0.999, ϵ = 10 8). The learning rate α was tuned for each optimizer and test problem individually, by evaluating on a logarithmic grid from αmin = 10 5 to αmax = 102 with 36 samples.In Appendix A, specific details for each test problem are provided, such as 'Trained with a batch size of 128 for 100 epochs.' (P3 FASHION-MNIST CNN). |