AdaAug: Learning Class- and Instance-adaptive Data Augmentation Policies

Authors: Tsz-Him Cheung, Dit-Yan Yeung

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that the adaptive augmentation policies learned by our method transfer well to unseen datasets such as the Oxford Flowers, Oxford-IIT Pets, FGVC Aircraft, and Stanford Cars datasets when compared with other Auto DA baselines. In addition, our method also achieves a state-of-the-art performance on the CIFAR-10, CIFAR-100, and SVHN datasets.1
Researcher Affiliation Academia Tsz-Him Cheung & Dit-Yan Yeung Department of Computer Science and Engineering The Hong Kong University of Science and Technology {thcheungae, dyyeung}@cse.ust.hk
Pseudocode Yes Algorithm 1 Search algorithm
Open Source Code Yes Code is available at https://github.com/jamestszhim/adaptive_augment
Open Datasets Yes We search for the optimal augmentation policy on the CIFAR-100 dataset and use the learned policy to train with four fine-grained classification datasets: Oxford 102 Flowers (Nilsback & Zisserman, 2008), Oxford-IIIT Pets (Em et al., 2017), FGVC Aircraft (Maji et al., 2013), and Stanford Cars (Krause et al., 2013). We compare Ada Aug-direct with state-of-the-art Auto DA methods using the same evaluation datasets: CIFAR10, CIFAR-100 (Krizhevsky & Hinton, 2009), and SVHN (Netzer et al., 2011).
Dataset Splits Yes We follow the setup adopted by Auto Augment (Cubuk et al., 2019) to use 4,000 training images for CIFAR-10 and CIFAR-100, and 1,000 training images for SVHN. The remaining images are used as the validation set.
Hardware Specification Yes Ada Aug takes only 3.3 GPU hours on an old Ge Force GTX 1080 GPU card (see Appendix A.4).
Software Dependencies No The paper mentions using the 'Adam optimizer' but does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes We implement h as a linear layer and update the policy parameter γ after every 10 training steps using the Adam optimizer with a learning rate of 0.001 and a batch size of 128. We use the cosine learning rate decay with one annealing cycle (Loshchilov & Hutter, 2017), initial learning rate of 0.1, weight decay 1e-4 and gradient clipping parameter 5.