Optimizing Hyperparameters with Conformal Quantile Regression
Authors: David Salinas, Jacek Golebiowski, Aaron Klein, Matthias Seeger, Cedric Archambeau
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run empirical evaluations on a large set of benchmarks, demonstrating that quantile regression surrogates achieve a more robust performance compared to state-of-the-art methods in the single-fidelity case |
| Researcher Affiliation | Industry | 1Amazon Web Services. Correspondence to: David Salinas <david.salinas.pro@gmail.com>. |
| Pseudocode | Yes | Algorithm 1 CQR candidate suggestion pseudo-code. |
| Open Source Code | Yes | The code to reproduce our results is available at https://github.com/geoalgo/syne-tune/tree/icml_conformal. |
| Open Datasets | Yes | Our experiments rely on 13 tasks coming from FCNet (Klein & Hutter, 2019), NAS201 (Dong & Yang, 2020) and LCBench (Zimmer et al., 2021) benchmarks as well as NAS301 (Siems et al., 2020) using the implementation provided in (Pfisterer et al., 2022). |
| Dataset Splits | Yes | Dtrain, Dval = split train val(D) (from Algorithm 1). All runs are repeated with 30 different random seeds (from Experiment Setup). In each case, we draw a random subset of size n to train the surrogate model and then evaluate the three metrics on remaining unseen examples. |
| Hardware Specification | Yes | We use the simulation backend provided by Syne Tune (Salinas et al., 2022) on a AWS m5.4xlarge machine to simulate methods which allows to account for both optimizers and blackbox runtimes. |
| Software Dependencies | No | We use gradient boosted trees (Friedman, 2001) for the quantile-regression models... BORE is evaluated with XGBoost as the classifier... No specific version numbers for these software components are provided. |
| Experiment Setup | Yes | All tuning experiments run asynchronously with 4 workers and are stopped when 200 rmax results were observed, which corresponds to seeing 200 different configurations for single-fidelity methods, or when the wallclock time exceeded a fixed budget. All runs are repeated with 30 different random seeds |