Few-Shot Bayesian Optimization with Deep Kernel Surrogates
Authors: Martin Wistuba, Josif Grabocka
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | All of our contributions are empirically compared with several competitive methods in three different problems. Two ablation studies provide information about the influence of the individual components. We used three different optimization problems to compare the different hyperparameter optimization methods: Ada Boost, GLMNet, and SVM. We report the aggregated results for all tasks within one problem class with respect to the mean of normalized regrets. |
| Researcher Affiliation | Collaboration | Martin Wistuba IBM Research Dublin, Ireland martin.wistuba@ibm.com Josif Grabocka University of Freiburg Freiburg, Germany grabocka@cs.uni-freiburg.de |
| Pseudocode | Yes | Algorithm 1: Few-Shot GP Surrogate |
| Open Source Code | No | The paper links to code used by other researchers for comparison (e.g., 'Using the authors code2, we executed the same scripts used by the authors to report their results for T = 15 trials...2https://github.com/boschresearch/Meta BO'), but does not provide a link or statement about open-sourcing its own methodology's code. |
| Open Datasets | Yes | We created the GLMNet and SVM metadata set by downloading the 30 data sets with the most reported hyperparameter settings from Open ML for each problem. The Ada Boost data set is publicly available (Wistuba et al., 2018). Table 3: Metadata set statistics. Open ML ID refers to a task on openml.org. |
| Dataset Splits | Yes | The experiments are repeated ten times and evaluated in a leave-one-task-out cross-validation. |
| Hardware Specification | No | The paper does not mention any specific hardware used for running the experiments (e.g., GPU/CPU models, memory, or cloud instances). |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'scikit-optimize implementation' but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | The deep kernel is composed of a two-layer neural network (128 128) with Re LU activations and a squared-exponential kernel. We use the Adam optimizer with learning rate 10 3 and a batch size of fifty. The warm start length is five. |