AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Authors: Dan Hendrycks*, Norman Mu*, Ekin Dogus Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental AUGMIX significantly improves robustness and uncertainty measures on challenging image classification benchmarks, closing the gap between previous methods and the best possible performance in some cases by more than half.
Researcher Affiliation Collaboration Dan Hendrycks Deep Mind hendrycks@berkeley.edu Norman Mu Google normanmu@google.com Ekin D. Cubuk Google cubuk@google.com Barret Zoph Google barretzoph@google.com Justin Gilmer Google gilmer@google.com Balaji Lakshminarayanan Deep Mind balajiln@google.com
Pseudocode Yes Algorithm AUGMIX Pseudocode
Open Source Code Yes Code is available at https://github.com/google-research/augmix.
Open Datasets Yes The two CIFAR (Krizhevsky & Hinton, 2009) datasets contain small 32 32 3 color natural images, both with 50,000 training images and 10,000 testing images. The Image Net (Deng et al., 2009) dataset contains 1,000 classes of approximately 1.2 million large-scale color images.
Dataset Splits No The paper states the number of training and testing images for CIFAR datasets but does not explicitly mention a separate validation split or its size: "The two CIFAR (Krizhevsky & Hinton, 2009) datasets contain small 32 32 3 color natural images, both with 50,000 training images and 10,000 testing images."
Hardware Specification No The paper does not provide specific details about the hardware used for its experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes All networks use an initial learning rate of 0.1 which decays following a cosine learning rate (Loshchilov & Hutter, 2016). All input images are pre-processed with standard random left-right flipping and cropping prior to any augmentations. ... The All Convolutional Network and Wide Res Net train for 100 epochs, and the Dense Net and Res Ne Xt require 200 epochs for convergence. We optimize with stochastic gradient descent using Nesterov momentum. Following Zhang et al. (2017); Guo et al. (2019), we use a weight decay of 0.0001 for Mixup and 0.0005 otherwise.