PASHA: Efficient HPO and NAS with Progressive Resource Allocation
Authors: Ondrej Bohdal, Lukas Balles, Martin Wistuba, Beyza Ermis, Cedric Archambeau, Giovanni Zappella
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental comparison shows that PASHA identifies well-performing hyperparameter configurations and architectures while consuming significantly fewer computational resources than ASHA.Our empirical evaluation shows PASHA can save a significant amount of resources while finding similarly well-performing configurations as conventional ASHA, reducing the entry barrier to do HPO and NAS.Our empirical evaluation shows the approach significantly speeds up HPO and NAS without sacrificing the performance.In this section we empirically evaluate the performance of PASHA. |
| Researcher Affiliation | Collaboration | Ondrej Bohdal1 , Lukas Balles2, Martin Wistuba2, Beyza Ermis3 , C edric Archambeau2, Giovanni Zappella2 1The University of Edinburgh 2AWS, Berlin 3Cohere for AI 1ondrej.bohdal@ed.ac.uk 3beyza@cohere.com 2{balleslb,marwistu,cedrica,zappella}@amazon.com |
| Pseudocode | Yes | We describe the details of our proposed approach in Algorithm 1.Algorithm 1 Progressive Asynchronous Successive Halving (PASHA) |
| Open Source Code | Yes | We include the code for our approach as part of the supplementary material, including details for how to run the experiments.In addition, PASHA is available as part of the Syne Tune library (Salinas et al., 2022). |
| Open Datasets | Yes | We tested our method on two different sets of experiments. The first set evaluates the algorithm on NAS problems and uses NASBench201 (Dong & Yang, 2020), while the second set focuses on HPO and was run on two large-scale tasks from PD1 benchmark (Wang et al., 2021). |
| Dataset Splits | Yes | For the purpose of these experiments we re-train all the models using only the training set. This avoids introducing an arbitrary choice on the validation set size and allows us to leverage standard benchmarks such as NASBench201.To measure the predictive performance we report the best accuracy on the combined validation and test set provided by the creators of the benchmark. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU models, CPU specifications, or memory, beyond mentioning '4 workers'. |
| Software Dependencies | No | The paper mentions that its implementation is based on the 'Syne Tune library (Salinas et al., 2022)' but does not specify a version number for this or any other software component used. |
| Experiment Setup | Yes | Our experimental setup consists of two phases: 1) run the hyperparameter optimizer until N = 256 candidate configurations are evaluated; and 2) use the best configuration identified in the first phase to re-train the model from scratch.We use 4 workers to perform parallel and asynchronous evaluations.r is also dataset-dependent and η, the halving factor, is set to 3 unless otherwise specified.For our NAS experiments... We use r = 1 epoch and R = 200 epochs.In PD1 we optimize four hyperparameters: base learning rate η 10 5, 10.0 (log scale), momentum 1 β 10 3, 1.0 (log scale), polynomial learning rate decay schedule power p [0.1, 2.0] (linear scale) and decay steps fraction λ [0.01, 0.99] (linear scale).The minibatch size used for WMT experiments is 64, while the minibatch size for Image Net experiments is 512. |