Multi-Objective Population Based Training

Authors: Arkadiy Dushatskiy, Alexander Chebykin, Tanja Alderliesten, Peter Bosman

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on diverse multi-objective hyperparameter optimization problems (Precision/Recall, Accuracy/Fairness, Accuracy/Adversarial Robustness) show that MO-PBT outperforms random search, single-objective PBT, and the state-of-the-art multi-objective hyperparameter optimization algorithm MO-ASHA.
Researcher Affiliation Academia 1Centrum Wiskunde & Informatica, Amsterdam, the Netherlands 2Leiden University Medical Center, Leiden, the Netherlands 3Delft University of Technology, Delft, the Netherlands.
Pseudocode Yes Algorithm 1: Procedure to sort solutions in MO-PBT (sort Population), Algorithm 2: Exploit in MO-PBT (exploit), Algorithm 3: Explore in MO-PBT (explore)
Open Source Code Yes Further experimental setup details are provided in Appendix, B. The code is available at https://github. com/Arkadiy D/MO-PBT.
Open Datasets Yes Adult (Dua & Graff, 2017), Higgs (Baldi et al., 2014), and Click prediction (Vanschoren et al., 2013)., Celeb A dataset (Liu et al., 2015), CIFAR-10/100 datasets
Dataset Splits Yes Datasets are split into train/validation/test subsets before experiments. In our main results, we report the abovedescribed hypervolume metric on the validation subset to evaluate the search performance of the algorithms.
Hardware Specification Yes We used machines with 3 Nvidia A5000 GPUs and trained 4 networks on each GPU simultaneously, i.e., 12 networks could be trained in parallel.
Software Dependencies No We implemented all algorithms using Ray Tune library (Liaw et al., 2018). Network training was performed using Py Torch (Paszke et al., 2019). (Specific versions of PyTorch and Ray Tune are not provided, only citations.)
Experiment Setup Yes We use a population of size 32 in our main experiments, exploit-and-explore procedure every 2 epochs of training, Batch size is set to 512. The training is performed for 100 epochs. On the image datasets, we use standard for Wide Res Net (used, for instance, in (Cubuk et al., 2020)) cosine learning rate schedule with an initial learning rate 0.1 for SGD with momentum value of 0.9, and batch size 128. The training is performed for 100 epochs. For all described optimization tasks, search spaces of hyperparameters are specified in Appendix H.