Weighted Sampling for Combined Model Selection and Hyperparameter Tuning

Authors: Dimitrios Sarigiannis, Thomas Parnell, Haralampos Pozidis5595-5603

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We select three popular model-free hyperparameter tuning algorithms and perform a large empirical study, using 67 datasets from Open ML (Vanschoren et al. 2013), with uniform sampling as well as the proposed scheme.
Researcher Affiliation Industry Dimitrios Sarigiannis, Thomas Parnell, Haralampos Pozidis IBM Research S aumerstrasse 4, 8803 R uschlikon, Switzerland saridimi@gmail.com, {tpa, hap}@zurich.ibm.com
Pseudocode Yes Algorithm 1 Successive Halving Require: initial number of configurations n0, minimum resource rmin, scaling factor η, sampling distribution p(λ, α) 1: smax logη(rmin) Ensure: n0 >= ηsmax 2: T sample configurations(n, p(λ, α)) 3: for i {0, 1, ..., smax} do 4: ni nη i 5: ri η smax+i 6: L eval and return val loss(θ, ri) : θ T 7: T top k(T, L, ni/η) 8: end for 9: return Configuration with the smallest intermediate loss seen so far in T
Open Source Code No No explicit statement about providing open-source code for the methodology or a link to a code repository was found.
Open Datasets Yes All datasets were obtained from the Open ML platform (Vanschoren et al. 2013) and their characteristics are summarized in Table 2. A complete list of Open ML dataset IDs is provided in Appendix A, and the pre-processing scheme used is provided in Appendix B.
Dataset Splits Yes Firstly, we create a stratified train/test split of each dataset. We then perform 10 different stratified splits of the training set to create a collection of 10 different train/validation sets.
Hardware Specification No No specific hardware specifications (e.g., GPU/CPU models, memory, or specific cloud instance types) used for running experiments were mentioned in the paper.
Software Dependencies Yes For XGBoost we have used the xgboost v0.82 library and for the rest of the classifiers we have used scikit-learn v0.21.2. All of the above methods are implemented in the R package SCMAMP (Calvo and Santaf e Rodrigo 2016), which we will make extensive use of in the following section.
Experiment Setup Yes In our first comparison, we compare the three different SH schedules defined in Table 1 with a budget of 33, so that in the most explorative schedule n0 = 99 configurations are evaluated in the first rung. For each schedule, we evaluate SH with uniform model sampling and also with the weighted model sampling defined in equation (5). The hyperparameters for each model are sampled uniformly from a fixed range in both cases (possibly with some logarithmic transformations).