Initializing Bayesian Hyperparameter Optimization via Meta-Learning

Authors: Matthias Feurer, Jost Springenberg, Frank Hutter

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate our approach, we perform extensive experiments with two established SMBO frameworks (Spearmint and SMAC) with complementary strengths; optimizing two machine learning frameworks on 57 datasets.
Researcher Affiliation Academia Matthias Feurer and Jost Tobias Springenberg and Frank Hutter {feurerm,springj,fh}@cs.uni-freiburg.de Computer Science Department, University of Freiburg Georges-K ohler-Allee 52 79110 Freiburg, Germany
Pseudocode Yes Algorithm 1: Generic Sequential Model-based Optimization. SMBO(f D, T, Θ, θ1:t)
Open Source Code No The paper mentions supplementary material for more results, but does not state that source code for the described methodology is publicly available. The text "for more results, please see the supplementary material: www.automl.org/aaai2015-mi-smbo-supplementary.pdf" refers to results, not code.
Open Datasets Yes We found the Open ML project (Vanschoren et al. 2013) to be the best source of datasets and used the 60 classification datasets it contained in April 2014.
Dataset Splits Yes We first shuffled each dataset and then split it in stratified fashion into 2/3 training and 1/3 test data. Then, we computed the validation performance for Bayesian optimization by ten-fold crossvalidation on the training dataset.
Hardware Specification No The paper mentions that calculating the grid took up to three days per dataset "on a modern CPU" but provides no specific hardware details (e.g., CPU model, GPU, memory).
Software Dependencies No The paper mentions the use of "scikit-learn package (Pedregosa et al. 2011)" and "WEKA package (Hall et al. 2009)" but does not specify version numbers for these or any other software components.
Experiment Setup Yes To keep the computation bearable and the results interpretable, we only included three classification algorithms: an SVM with an RBF kernel, a linear SVM, and random forests. Since we expected noise and redundancies in the training data, we also allowed the optimization procedure to use Principal Component Analysis (PCA) for preprocessing; with the number of PCA components being conditional on PCA being applied. In total this lead to 10 hyperparameters, as detailed in Table 2.