Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules
Authors: Daniel Ho, Eric Liang, Xi Chen, Ion Stoica, Pieter Abbeel
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that PBA can match the performance of Auto Augment on CIFAR-10, CIFAR-100, and SVHN, with three orders of magnitude less overall compute. On CIFAR-10 we achieve a mean test error of 1.46%, which is a slight improvement upon the current state-of-the-art. and 4. Experiments and Analysis |
| Researcher Affiliation | Collaboration | 1EECS, UC Berkeley, Berkeley, California, USA 2Current affiliation: X, Mountain View, California, USA 3covariant.ai, Berkeley, California, USA. |
| Pseudocode | Yes | Algorithm 1 The PBA augmentation policy template, the parameters of which are optimized by PBT. and Algorithm 2 The PBA explore function. |
| Open Source Code | Yes | The code for PBA is open source and is available at https://github.com/arcelien/pba. |
| Open Datasets | Yes | We show that PBA can match the performance of Auto Augment on CIFAR-10, CIFAR-100, and SVHN and CIFAR-10 (Krizhevsky, 2009) and SVHN (Netzer et al., 2011) datasets. |
| Dataset Splits | Yes | Eval: We evaluate a trial on a validation set not used for PBT training and disjoint from the final test set. and Following (Cubuk etal., 2018), we search over a reduced dataset of 4,000 and 1,000 training images for CIFAR-10 and SVHN respectively. |
| Hardware Specification | Yes | Auto Augment reported estimated cost in Tesla P100 GPU hours, while PBA measured cost in Titan XP GPU hours. and We learn a robust augmentation policy on CIFAR-10 data in five hours using one NVIDIA Titan XP GPU |
| Software Dependencies | No | The paper mentions using 'Ray' but does not provide specific version numbers for Ray or any other software dependencies. |
| Experiment Setup | Yes | Pyramid Net with Shake-Drop uses a batch size of 64, and all other models use a batch size of 128. and For Wide-Res Net-28-10 and Wide-Res Net-40-2 trained on SVHN, we use the step learning rate schedule proposed in (Devries & Taylor, 2017), and for all others we use a cosine learning rate with one annealing cycle (Loshchilov & Hutter, 2016). For all models, we use gradient clipping with magnitude 5. |