reproducibilityindex.ai

DeepOBS: A Deep Learning Optimizer Benchmark Suite

Authors: Frank Schneider, Lukas Balles, Philipp Hennig

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As the primary contribution, we present DEEPOBS, a Python package of deep learning optimization benchmarks. The package addresses key challenges in the quantitative assessment of stochastic optimizers, and automates most steps of benchmarking. The library includes a wide and extensible set of ready-to-use realistic optimization problems, such as training Residual Networks for image classiﬁcation on IMAGENET or character-level language prediction models, as well as popular classics like MNIST and CIFAR-10. The package also provides realistic baseline results for the most popular optimizers on these test problems, ensuring a fair comparison to the competition when benchmarking new optimizers, and without having to run costly experiments.In Section 4 we report on the performance of SGD, SGD with momentum (MOMENTUM) and ADAM on the small and large benchmarks (this also demonstrates the output of the benchmark).
Researcher Affiliation	Academia	Frank Schneider, Lukas Balles & Philipp Hennig University of T ubingen and Max Planck Institute for Intelligent Systems T ubingen, Germany {frank.schneider,lukas.balles, ph}@tue.mpg.de
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code	Yes	As the primary contribution, we present DEEPOBS, a Python package of deep learning optimization benchmarks... It supports TENSORFLOW and is available open source. Code available at https://github.com/fsschneider/deepobs.
Open Datasets	Yes	The library includes a wide and extensible set of ready-to-use realistic optimization problems, such as training Residual Networks for image classiﬁcation on IMAGENET or character-level language prediction models, as well as popular classics like MNIST and CIFAR-10. ... Table 1: Overview of the test problems included in DEEPOBS... Data set Model Description ... MNIST ... FASHION ... CIFAR-10 ... CIFAR-100 ... SVHN ... IMAGENET ... Tolstoi Char RNN
Dataset Splits	Yes	For hyperparameter tuning, we use test accuracy or, if that is not available, test loss, as the criteria. ... when we evaluate on the test set, we also evaluate on a larger chunk of training data, which we call a train eval set. ... The learning rate α was tuned for each optimizer and test problem individually, by evaluating on a logarithmic grid from αmin = 10 5 to αmax = 102 with 36 samples.In Appendix A, specific details for each test problem are provided, such as 'Trained with a batch size of 128 for 100 epochs.' (P3 FASHION-MNIST CNN).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments. It only vaguely refers to 'substantial computational effort' and 'can be done sequentially on the same hardware'.
Software Dependencies	Yes	We have distilled the above ideas into an open-source python package, written in TENSORFLOW (Abadi et al., 2015)... The experiments were done with version 1.1.0 of DEEPOBS.
Experiment Setup	Yes	For the baseline results provided with DEEPOBS, we evaluate three popular deep learning optimizers (SGD, MOMENTUM and ADAM) on the eight test problems... All experiments used 0.99 for the MOMENTUM parameter and default parameters for ADAM (β1 = 0.9, β2 = 0.999, ϵ = 10 8). The learning rate α was tuned for each optimizer and test problem individually, by evaluating on a logarithmic grid from αmin = 10 5 to αmax = 102 with 36 samples.In Appendix A, specific details for each test problem are provided, such as 'Trained with a batch size of 128 for 100 epochs.' (P3 FASHION-MNIST CNN).