Supervising the Multi-Fidelity Race of Hyperparameter Configurations

Authors: Martin Wistuba, Arlind Kadra, Josif Grabocka

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the significant superiority of Dy HPO against state-of-the-art hyperparameter optimization methods through large-scale experiments comprising 50 datasets (Tabular, Image, NLP) and diverse architectures (MLP, CNN/NAS, RNN).
Researcher Affiliation Collaboration Martin Wistuba Amazon Web Services, Berlin, Germany marwistu@amazon.com Arlind Kadra University of Freiburg, Freiburg, Germany kadraa@cs.uni-freiburg.de Josif Grabocka University of Freiburg, Freiburg, Germany grabocka@cs.uni-freiburg.de
Pseudocode Yes Algorithm 1 DYHPO Algorithm
Open Source Code Yes Our implementation of DYHPO is publicly available.3 (Footnote 3: https://github.com/releaunifreiburg/Dy HPO)
Open Datasets Yes LCBench: A learning curve benchmark [Zimmer et al., 2021]... Task Set: A benchmark that features diverse tasks Metz et al. [2020]... NAS-Bench-201: A benchmark consisting of 15625 hyperparameter configurations representing different architectures on the CIFAR-10, CIFAR-100 and Image Net datasets Dong and Yang [2020].
Dataset Splits No The paper describes the benchmarks used (LCBench, Task Set, NAS-Bench-201) and their characteristics, but does not explicitly state the training/validation/test dataset splits used for the experiments conducted in this paper, nor does it refer to specific predefined splits within those benchmarks that they utilized.
Hardware Specification Yes We ran all of our experiments on an Amazon EC2 M5 Instance (m5.xlarge).
Software Dependencies No The paper does not explicitly provide a list of specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow, or other libraries with version numbers).
Experiment Setup Yes For DYHPO, we use a constant learning rate of 0.1 for training the kernel parameters, and we train for 100 iterations per step. For all methods, we use a single learning rate of 0.001.