Optimizing Hyperparameters with Conformal Quantile Regression

Authors: David Salinas, Jacek Golebiowski, Aaron Klein, Matthias Seeger, Cedric Archambeau

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run empirical evaluations on a large set of benchmarks, demonstrating that quantile regression surrogates achieve a more robust performance compared to state-of-the-art methods in the single-fidelity case
Researcher Affiliation Industry 1Amazon Web Services. Correspondence to: David Salinas <david.salinas.pro@gmail.com>.
Pseudocode Yes Algorithm 1 CQR candidate suggestion pseudo-code.
Open Source Code Yes The code to reproduce our results is available at https://github.com/geoalgo/syne-tune/tree/icml_conformal.
Open Datasets Yes Our experiments rely on 13 tasks coming from FCNet (Klein & Hutter, 2019), NAS201 (Dong & Yang, 2020) and LCBench (Zimmer et al., 2021) benchmarks as well as NAS301 (Siems et al., 2020) using the implementation provided in (Pfisterer et al., 2022).
Dataset Splits Yes Dtrain, Dval = split train val(D) (from Algorithm 1). All runs are repeated with 30 different random seeds (from Experiment Setup). In each case, we draw a random subset of size n to train the surrogate model and then evaluate the three metrics on remaining unseen examples.
Hardware Specification Yes We use the simulation backend provided by Syne Tune (Salinas et al., 2022) on a AWS m5.4xlarge machine to simulate methods which allows to account for both optimizers and blackbox runtimes.
Software Dependencies No We use gradient boosted trees (Friedman, 2001) for the quantile-regression models... BORE is evaluated with XGBoost as the classifier... No specific version numbers for these software components are provided.
Experiment Setup Yes All tuning experiments run asynchronously with 4 workers and are stopped when 200 rmax results were observed, which corresponds to seeing 200 different configurations for single-fidelity methods, or when the wallclock time exceeded a fixed budget. All runs are repeated with 30 different random seeds