PASHA: Efficient HPO and NAS with Progressive Resource Allocation

Authors: Ondrej Bohdal, Lukas Balles, Martin Wistuba, Beyza Ermis, Cedric Archambeau, Giovanni Zappella

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental comparison shows that PASHA identifies well-performing hyperparameter configurations and architectures while consuming significantly fewer computational resources than ASHA.Our empirical evaluation shows PASHA can save a significant amount of resources while finding similarly well-performing configurations as conventional ASHA, reducing the entry barrier to do HPO and NAS.Our empirical evaluation shows the approach significantly speeds up HPO and NAS without sacrificing the performance.In this section we empirically evaluate the performance of PASHA.
Researcher Affiliation Collaboration Ondrej Bohdal1 , Lukas Balles2, Martin Wistuba2, Beyza Ermis3 , C edric Archambeau2, Giovanni Zappella2 1The University of Edinburgh 2AWS, Berlin 3Cohere for AI 1ondrej.bohdal@ed.ac.uk 3beyza@cohere.com 2{balleslb,marwistu,cedrica,zappella}@amazon.com
Pseudocode Yes We describe the details of our proposed approach in Algorithm 1.Algorithm 1 Progressive Asynchronous Successive Halving (PASHA)
Open Source Code Yes We include the code for our approach as part of the supplementary material, including details for how to run the experiments.In addition, PASHA is available as part of the Syne Tune library (Salinas et al., 2022).
Open Datasets Yes We tested our method on two different sets of experiments. The first set evaluates the algorithm on NAS problems and uses NASBench201 (Dong & Yang, 2020), while the second set focuses on HPO and was run on two large-scale tasks from PD1 benchmark (Wang et al., 2021).
Dataset Splits Yes For the purpose of these experiments we re-train all the models using only the training set. This avoids introducing an arbitrary choice on the validation set size and allows us to leverage standard benchmarks such as NASBench201.To measure the predictive performance we report the best accuracy on the combined validation and test set provided by the creators of the benchmark.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU models, CPU specifications, or memory, beyond mentioning '4 workers'.
Software Dependencies No The paper mentions that its implementation is based on the 'Syne Tune library (Salinas et al., 2022)' but does not specify a version number for this or any other software component used.
Experiment Setup Yes Our experimental setup consists of two phases: 1) run the hyperparameter optimizer until N = 256 candidate configurations are evaluated; and 2) use the best configuration identified in the first phase to re-train the model from scratch.We use 4 workers to perform parallel and asynchronous evaluations.r is also dataset-dependent and η, the halving factor, is set to 3 unless otherwise specified.For our NAS experiments... We use r = 1 epoch and R = 200 epochs.In PD1 we optimize four hyperparameters: base learning rate η 10 5, 10.0 (log scale), momentum 1 β 10 3, 1.0 (log scale), polynomial learning rate decay schedule power p [0.1, 2.0] (linear scale) and decay steps fraction λ [0.01, 0.99] (linear scale).The minibatch size used for WMT experiments is 64, while the minibatch size for Image Net experiments is 512.