HyperJump: Accelerating HyperBand via Risk Modelling

Authors: Pedro Mendes, Maria Casimiro, Paolo Romano, David Garlan

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Hyper Jump on a suite of hyper-parameter optimization problems and show that it provides over one-order of magnitude speed-ups, both in sequential and parallel deployments, on a variety of deep-learning, kernel-based learning and neural architectural search problems when compared to Hyper Band and to several state-of-the-art optimizers.
Researcher Affiliation Academia Pedro Mendes1,2, Maria Casimiro1,2, Paolo Romano1, David Garlan2 1 INESC-ID and Instituto Superior T ecnico, Universidade de Lisboa 2 Software and Societal Systems Department, Carnegie Mellon University {pgmendes, mdaloura, dg4d}@andrew.cmu.edu, romano@inesc-id.pt
Pseudocode Yes Algorithm 1: Pseudo-code for a HJ bracket consisting of S stages, with budget b for the initial stage. [...] Algorithm 2: Pseudo-code of the logic used to determine the sets of configurations to consider when jumping from stage s to stage s + 1 (function GET CANDIDATES FOR S()) [...] Algorithm 3: Pseudo-code for the EVALUATE JUMP RISK function.
Open Source Code Yes We have made available the implementation of HJ and the benchmarks used 2. 2https://github.com/pedrogbmendes/HyperJump
Open Datasets Yes Our first benchmark, NATS-Bench (Dong et al. 2021)... We also use LIBSVM (Chang and Lin 2011) on the Covertype data set (Dua and Graff 2017)...
Dataset Splits No The paper mentions using NATS-Bench and LIBSVM datasets but does not provide specific details on the training/validation/test splits (e.g., percentages, sample counts, or methodology for splitting).
Hardware Specification No The paper discusses parallel deployments using '32 workers' but does not specify the type of hardware (e.g., GPU/CPU models, memory) used for the experiments.
Software Dependencies No The paper mentions software frameworks like Ray Tune and implementations of BOHB and HB, but it does not specify version numbers for any software dependencies.
Experiment Setup Yes We use the default parameters of BOHB and Fabolas. Similarly to HB, we set the parameter η to 3, and for fairness, when comparing HJ, HB, BOHB, and ASHA, we configure them to use the same η value. We use the default value of 10% for the threshold λ for HJ and include in the supplemental material a study on the sensitivity to the tuning of λ.