Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules

Authors: Daniel Ho, Eric Liang, Xi Chen, Ion Stoica, Pieter Abbeel

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that PBA can match the performance of Auto Augment on CIFAR-10, CIFAR-100, and SVHN, with three orders of magnitude less overall compute. On CIFAR-10 we achieve a mean test error of 1.46%, which is a slight improvement upon the current state-of-the-art. and 4. Experiments and Analysis
Researcher Affiliation Collaboration 1EECS, UC Berkeley, Berkeley, California, USA 2Current affiliation: X, Mountain View, California, USA 3covariant.ai, Berkeley, California, USA.
Pseudocode Yes Algorithm 1 The PBA augmentation policy template, the parameters of which are optimized by PBT. and Algorithm 2 The PBA explore function.
Open Source Code Yes The code for PBA is open source and is available at https://github.com/arcelien/pba.
Open Datasets Yes We show that PBA can match the performance of Auto Augment on CIFAR-10, CIFAR-100, and SVHN and CIFAR-10 (Krizhevsky, 2009) and SVHN (Netzer et al., 2011) datasets.
Dataset Splits Yes Eval: We evaluate a trial on a validation set not used for PBT training and disjoint from the final test set. and Following (Cubuk etal., 2018), we search over a reduced dataset of 4,000 and 1,000 training images for CIFAR-10 and SVHN respectively.
Hardware Specification Yes Auto Augment reported estimated cost in Tesla P100 GPU hours, while PBA measured cost in Titan XP GPU hours. and We learn a robust augmentation policy on CIFAR-10 data in five hours using one NVIDIA Titan XP GPU
Software Dependencies No The paper mentions using 'Ray' but does not provide specific version numbers for Ray or any other software dependencies.
Experiment Setup Yes Pyramid Net with Shake-Drop uses a batch size of 64, and all other models use a batch size of 128. and For Wide-Res Net-28-10 and Wide-Res Net-40-2 trained on SVHN, we use the step learning rate schedule proposed in (Devries & Taylor, 2017), and for all others we use a cosine learning rate with one annealing cycle (Loshchilov & Hutter, 2016). For all models, we use gradient clipping with magnitude 5.