reproducibilityindex.ai

Efficient Benchmarking of Hyperparameter Optimizers via Surrogates

Authors: Katharina Eggensperger, Frank Hutter, Holger Hoos, Kevin Leyton-Brown

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated a wide range of regression techniques, both in terms of how well they predict the performance of new hyperparameter settings and in terms of the quality of surrogate benchmarks obtained.
Researcher Affiliation	Academia	Katharina Eggensperger and Frank Hutter University of Freiburg {eggenspk, fh}@cs.uni-freiburg.de Holger H. Hoos and Kevin Leyton-Brown University of British Columbia {hoos, kevinlb}@cs.ubc.ca
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	Our final surrogate benchmarks are freely available online at www.automl.org/benchmarks.html. This link provides access to the benchmarks (trained models and data) but not explicitly the source code used for the methodology (training of the regression models).
Open Datasets	Yes	We experimented with nine benchmarks from the hyperparameter optimization benchmark library HPOLIB (Eggensperger et al. 2013), including three low-dimensional and six high-dimensional hyperparameter spaces. The low-dimensional benchmarks were derived from a logistic regression (Snoek, Larochelle, and Adams 2012) with 4 hyperparameters on the MNIST dataset (Le Cun et al. 1998)...
Dataset Splits	Yes	We experimented with nine benchmarks from the hyperparameter optimization benchmark library HPOLIB (Eggensperger et al. 2013), including three low-dimensional and six high-dimensional hyperparameter spaces. The low-dimensional benchmarks were derived from a logistic regression (Snoek, Larochelle, and Adams 2012) with 4 hyperparameters on the MNIST dataset (Le Cun et al. 1998) (both with and without 5-fold cross validation)...
Hardware Specification	Yes	The evaluation of a single conﬁguration of the logistic regression required roughly 1 minute on a single core of an Intel Xeon E5-2650 v2 CPU... To run efﬁciently, the HP-DBNET required a GPGPU; on a modern Geforce GTX780 GPU, it took roughly 15 minutes to evaluate a single conﬁguration.
Software Dependencies	Yes	As a baseline, we also experimented with k-nearest-neighbours (k NN), linear regression, ridge regression, and two SVM methods (all as implemented by scikit-learn, version 0.15.1 (Pedregosa et al. 2011)).
Experiment Setup	Yes	We experimented with nine benchmarks from the hyperparameter optimization benchmark library HPOLIB (Eggensperger et al. 2013)... For each benchmark, we executed 10 runs each of SMAC, SPEARMINT, TPE and random search (using the Hyperopt implementation of both random search and TPE)... We used random search to optimize hyperparameters and considered 100 samples over the stated hyperparameters; we trained the model on 50% of the data, chose the best conﬁguration based on its performance on the other 50%, and then trained on all data.