reproducibilityindex.ai

Meta-Surrogate Benchmarking for Hyperparameter Optimization

Authors: Aaron Klein, Zhenwen Dai, Frank Hutter, Neil Lawrence, Javier Gonzalez

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This work proposes a method to alleviate these issues by means of a meta-surrogate model for HPO tasks trained on off-line generated data. The model combines a probabilistic encoder with a multi-task model such that it can generate inexpensive and realistic tasks of the class of problems of interest. We demonstrate that benchmarking HPO methods on samples of the generative model allows us to draw more coherent and statistically signiﬁcant conclusions that can be reached orders of magnitude faster than using the original tasks. We provide evidence of our ﬁndings for various HPO methods on a wide class of problems.
Researcher Affiliation	Collaboration	Aaron Klein1 Zhenwen Dai2 Frank Hutter1 Neil Lawrence3 Javier González2 1University of Freiburg 2Amazon Cambridge 3University of Cambridge
Pseudocode	Yes	We provide pseudo code in Appendix G.
Open Source Code	Yes	We now present our PRObabilistic data-e Fﬁcient Experimentation Tool, called PROFET, a benchmarking suite for HPO methods (an open-source implementation is available here: https://github.com/amzn/emukit).
Open Datasets	Yes	For classiﬁcation, we considered a support vector machine (SVM) with D = 2 hyperparameters and a feed forward neural network (FC-Net) with D = 6 hyperparameters on 16 Open ML [41] tasks each. We used gradient boosting (XGBoost)2 with D = 8 hyperparameters for regression on 11 different UCI datasets [30].
Dataset Splits	No	The paper describes data collection by drawing '100D pseudo randomly generated conﬁgurations from a Sobol grid' and training the meta-model on '9 selected tasks.' However, it does not provide specific percentages or counts for training, validation, and test splits for the datasets or tasks used in the experiments, nor does it refer to standard predefined splits with sufficient detail for reproduction.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. General terms like 'wall-clock time' are mentioned but no hardware is described.
Software Dependencies	No	The paper mentions various software components and libraries, such as 'XGBoost', 'HPOlib', 'SMAC', 'Hyperopt package', 'pycma', and 'Ro BO package', and provides links to some implementations. However, it does not specify version numbers for any of these software dependencies.
Experiment Setup	Yes	We conducted 20 independent runs for each method on every task of all three problem classes described in Section 4.1 with different random seeds. Each method had a budget of 200 function evaluations per task, except for BO-GP and BOHAMIANN, where, due to their computational overhead, we were only able to perform 100 function evaluations.