Few-Shot Bayesian Optimization with Deep Kernel Surrogates

Authors: Martin Wistuba, Josif Grabocka

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental All of our contributions are empirically compared with several competitive methods in three different problems. Two ablation studies provide information about the influence of the individual components. We used three different optimization problems to compare the different hyperparameter optimization methods: Ada Boost, GLMNet, and SVM. We report the aggregated results for all tasks within one problem class with respect to the mean of normalized regrets.
Researcher Affiliation Collaboration Martin Wistuba IBM Research Dublin, Ireland martin.wistuba@ibm.com Josif Grabocka University of Freiburg Freiburg, Germany grabocka@cs.uni-freiburg.de
Pseudocode Yes Algorithm 1: Few-Shot GP Surrogate
Open Source Code No The paper links to code used by other researchers for comparison (e.g., 'Using the authors code2, we executed the same scripts used by the authors to report their results for T = 15 trials...2https://github.com/boschresearch/Meta BO'), but does not provide a link or statement about open-sourcing its own methodology's code.
Open Datasets Yes We created the GLMNet and SVM metadata set by downloading the 30 data sets with the most reported hyperparameter settings from Open ML for each problem. The Ada Boost data set is publicly available (Wistuba et al., 2018). Table 3: Metadata set statistics. Open ML ID refers to a task on openml.org.
Dataset Splits Yes The experiments are repeated ten times and evaluated in a leave-one-task-out cross-validation.
Hardware Specification No The paper does not mention any specific hardware used for running the experiments (e.g., GPU/CPU models, memory, or cloud instances).
Software Dependencies No The paper mentions software components like 'Adam optimizer' and 'scikit-optimize implementation' but does not provide specific version numbers for any of them.
Experiment Setup Yes The deep kernel is composed of a two-layer neural network (128 128) with Re LU activations and a squared-exponential kernel. We use the Adam optimizer with learning rate 10 3 and a batch size of fifty. The warm start length is five.