Meta-Surrogate Benchmarking for Hyperparameter Optimization

Authors: Aaron Klein, Zhenwen Dai, Frank Hutter, Neil Lawrence, Javier Gonzalez

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This work proposes a method to alleviate these issues by means of a meta-surrogate model for HPO tasks trained on off-line generated data. The model combines a probabilistic encoder with a multi-task model such that it can generate inexpensive and realistic tasks of the class of problems of interest. We demonstrate that benchmarking HPO methods on samples of the generative model allows us to draw more coherent and statistically significant conclusions that can be reached orders of magnitude faster than using the original tasks. We provide evidence of our findings for various HPO methods on a wide class of problems.
Researcher Affiliation Collaboration Aaron Klein1 Zhenwen Dai2 Frank Hutter1 Neil Lawrence3 Javier González2 1University of Freiburg 2Amazon Cambridge 3University of Cambridge
Pseudocode Yes We provide pseudo code in Appendix G.
Open Source Code Yes We now present our PRObabilistic data-e Fficient Experimentation Tool, called PROFET, a benchmarking suite for HPO methods (an open-source implementation is available here: https://github.com/amzn/emukit).
Open Datasets Yes For classification, we considered a support vector machine (SVM) with D = 2 hyperparameters and a feed forward neural network (FC-Net) with D = 6 hyperparameters on 16 Open ML [41] tasks each. We used gradient boosting (XGBoost)2 with D = 8 hyperparameters for regression on 11 different UCI datasets [30].
Dataset Splits No The paper describes data collection by drawing '100D pseudo randomly generated configurations from a Sobol grid' and training the meta-model on '9 selected tasks.' However, it does not provide specific percentages or counts for training, validation, and test splits for the datasets or tasks used in the experiments, nor does it refer to standard predefined splits with sufficient detail for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. General terms like 'wall-clock time' are mentioned but no hardware is described.
Software Dependencies No The paper mentions various software components and libraries, such as 'XGBoost', 'HPOlib', 'SMAC', 'Hyperopt package', 'pycma', and 'Ro BO package', and provides links to some implementations. However, it does not specify version numbers for any of these software dependencies.
Experiment Setup Yes We conducted 20 independent runs for each method on every task of all three problem classes described in Section 4.1 with different random seeds. Each method had a budget of 200 function evaluations per task, except for BO-GP and BOHAMIANN, where, due to their computational overhead, we were only able to perform 100 function evaluations.