Multi-Objective Population Based Training
Authors: Arkadiy Dushatskiy, Alexander Chebykin, Tanja Alderliesten, Peter Bosman
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on diverse multi-objective hyperparameter optimization problems (Precision/Recall, Accuracy/Fairness, Accuracy/Adversarial Robustness) show that MO-PBT outperforms random search, single-objective PBT, and the state-of-the-art multi-objective hyperparameter optimization algorithm MO-ASHA. |
| Researcher Affiliation | Academia | 1Centrum Wiskunde & Informatica, Amsterdam, the Netherlands 2Leiden University Medical Center, Leiden, the Netherlands 3Delft University of Technology, Delft, the Netherlands. |
| Pseudocode | Yes | Algorithm 1: Procedure to sort solutions in MO-PBT (sort Population), Algorithm 2: Exploit in MO-PBT (exploit), Algorithm 3: Explore in MO-PBT (explore) |
| Open Source Code | Yes | Further experimental setup details are provided in Appendix, B. The code is available at https://github. com/Arkadiy D/MO-PBT. |
| Open Datasets | Yes | Adult (Dua & Graff, 2017), Higgs (Baldi et al., 2014), and Click prediction (Vanschoren et al., 2013)., Celeb A dataset (Liu et al., 2015), CIFAR-10/100 datasets |
| Dataset Splits | Yes | Datasets are split into train/validation/test subsets before experiments. In our main results, we report the abovedescribed hypervolume metric on the validation subset to evaluate the search performance of the algorithms. |
| Hardware Specification | Yes | We used machines with 3 Nvidia A5000 GPUs and trained 4 networks on each GPU simultaneously, i.e., 12 networks could be trained in parallel. |
| Software Dependencies | No | We implemented all algorithms using Ray Tune library (Liaw et al., 2018). Network training was performed using Py Torch (Paszke et al., 2019). (Specific versions of PyTorch and Ray Tune are not provided, only citations.) |
| Experiment Setup | Yes | We use a population of size 32 in our main experiments, exploit-and-explore procedure every 2 epochs of training, Batch size is set to 512. The training is performed for 100 epochs. On the image datasets, we use standard for Wide Res Net (used, for instance, in (Cubuk et al., 2020)) cosine learning rate schedule with an initial learning rate 0.1 for SGD with momentum value of 0.9, and batch size 128. The training is performed for 100 epochs. For all described optimization tasks, search spaces of hyperparameters are specified in Appendix H. |