AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
Authors: Dan Hendrycks*, Norman Mu*, Ekin Dogus Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | AUGMIX significantly improves robustness and uncertainty measures on challenging image classification benchmarks, closing the gap between previous methods and the best possible performance in some cases by more than half. |
| Researcher Affiliation | Collaboration | Dan Hendrycks Deep Mind hendrycks@berkeley.edu Norman Mu Google normanmu@google.com Ekin D. Cubuk Google cubuk@google.com Barret Zoph Google barretzoph@google.com Justin Gilmer Google gilmer@google.com Balaji Lakshminarayanan Deep Mind balajiln@google.com |
| Pseudocode | Yes | Algorithm AUGMIX Pseudocode |
| Open Source Code | Yes | Code is available at https://github.com/google-research/augmix. |
| Open Datasets | Yes | The two CIFAR (Krizhevsky & Hinton, 2009) datasets contain small 32 32 3 color natural images, both with 50,000 training images and 10,000 testing images. The Image Net (Deng et al., 2009) dataset contains 1,000 classes of approximately 1.2 million large-scale color images. |
| Dataset Splits | No | The paper states the number of training and testing images for CIFAR datasets but does not explicitly mention a separate validation split or its size: "The two CIFAR (Krizhevsky & Hinton, 2009) datasets contain small 32 32 3 color natural images, both with 50,000 training images and 10,000 testing images." |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for its experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | All networks use an initial learning rate of 0.1 which decays following a cosine learning rate (Loshchilov & Hutter, 2016). All input images are pre-processed with standard random left-right flipping and cropping prior to any augmentations. ... The All Convolutional Network and Wide Res Net train for 100 epochs, and the Dense Net and Res Ne Xt require 200 epochs for convergence. We optimize with stochastic gradient descent using Nesterov momentum. Following Zhang et al. (2017); Guo et al. (2019), we use a weight decay of 0.0001 for Mixup and 0.0005 otherwise. |