AdaAug: Learning Class- and Instance-adaptive Data Augmentation Policies
Authors: Tsz-Him Cheung, Dit-Yan Yeung
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the adaptive augmentation policies learned by our method transfer well to unseen datasets such as the Oxford Flowers, Oxford-IIT Pets, FGVC Aircraft, and Stanford Cars datasets when compared with other Auto DA baselines. In addition, our method also achieves a state-of-the-art performance on the CIFAR-10, CIFAR-100, and SVHN datasets.1 |
| Researcher Affiliation | Academia | Tsz-Him Cheung & Dit-Yan Yeung Department of Computer Science and Engineering The Hong Kong University of Science and Technology {thcheungae, dyyeung}@cse.ust.hk |
| Pseudocode | Yes | Algorithm 1 Search algorithm |
| Open Source Code | Yes | Code is available at https://github.com/jamestszhim/adaptive_augment |
| Open Datasets | Yes | We search for the optimal augmentation policy on the CIFAR-100 dataset and use the learned policy to train with four fine-grained classification datasets: Oxford 102 Flowers (Nilsback & Zisserman, 2008), Oxford-IIIT Pets (Em et al., 2017), FGVC Aircraft (Maji et al., 2013), and Stanford Cars (Krause et al., 2013). We compare Ada Aug-direct with state-of-the-art Auto DA methods using the same evaluation datasets: CIFAR10, CIFAR-100 (Krizhevsky & Hinton, 2009), and SVHN (Netzer et al., 2011). |
| Dataset Splits | Yes | We follow the setup adopted by Auto Augment (Cubuk et al., 2019) to use 4,000 training images for CIFAR-10 and CIFAR-100, and 1,000 training images for SVHN. The remaining images are used as the validation set. |
| Hardware Specification | Yes | Ada Aug takes only 3.3 GPU hours on an old Ge Force GTX 1080 GPU card (see Appendix A.4). |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' but does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | We implement h as a linear layer and update the policy parameter γ after every 10 training steps using the Adam optimizer with a learning rate of 0.001 and a batch size of 128. We use the cosine learning rate decay with one annealing cycle (Loshchilov & Hutter, 2017), initial learning rate of 0.1, weight decay 1e-4 and gradient clipping parameter 5. |