Supervising the Multi-Fidelity Race of Hyperparameter Configurations
Authors: Martin Wistuba, Arlind Kadra, Josif Grabocka
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the significant superiority of Dy HPO against state-of-the-art hyperparameter optimization methods through large-scale experiments comprising 50 datasets (Tabular, Image, NLP) and diverse architectures (MLP, CNN/NAS, RNN). |
| Researcher Affiliation | Collaboration | Martin Wistuba Amazon Web Services, Berlin, Germany marwistu@amazon.com Arlind Kadra University of Freiburg, Freiburg, Germany kadraa@cs.uni-freiburg.de Josif Grabocka University of Freiburg, Freiburg, Germany grabocka@cs.uni-freiburg.de |
| Pseudocode | Yes | Algorithm 1 DYHPO Algorithm |
| Open Source Code | Yes | Our implementation of DYHPO is publicly available.3 (Footnote 3: https://github.com/releaunifreiburg/Dy HPO) |
| Open Datasets | Yes | LCBench: A learning curve benchmark [Zimmer et al., 2021]... Task Set: A benchmark that features diverse tasks Metz et al. [2020]... NAS-Bench-201: A benchmark consisting of 15625 hyperparameter configurations representing different architectures on the CIFAR-10, CIFAR-100 and Image Net datasets Dong and Yang [2020]. |
| Dataset Splits | No | The paper describes the benchmarks used (LCBench, Task Set, NAS-Bench-201) and their characteristics, but does not explicitly state the training/validation/test dataset splits used for the experiments conducted in this paper, nor does it refer to specific predefined splits within those benchmarks that they utilized. |
| Hardware Specification | Yes | We ran all of our experiments on an Amazon EC2 M5 Instance (m5.xlarge). |
| Software Dependencies | No | The paper does not explicitly provide a list of specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow, or other libraries with version numbers). |
| Experiment Setup | Yes | For DYHPO, we use a constant learning rate of 0.1 for training the kernel parameters, and we train for 100 iterations per step. For all methods, we use a single learning rate of 0.001. |