Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness
Authors: Long Zhao, Ting Liu, Xi Peng, Dimitris Metaxas
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on three standard benchmarks demonstrate that our method consistently outperforms the existing state of the art by a statistically significant margin. Our code is available at https://github.com/garyzhao/ME-ADA. ... In this section, we evaluate our approach over a variety of settings. We first test with MNIST under the setting of large domain shifts, and then test on a more challenging dataset, with PACS data under the domain generalization setting. Further, we test on CIFAR-10-C and CIFAR-100-C which are standard benchmarks for evaluating model robustness to common corruptions. |
| Researcher Affiliation | Collaboration | Long Zhao1 Ting Liu2 Xi Peng3 Dimitris Metaxas1 1Rutgers University 2Google Research 3University of Delaware |
| Pseudocode | Yes | Algorithm 1 Max-Entropy Adversarial Data Augmentation (ME-ADA) |
| Open Source Code | Yes | Our code is available at https://github.com/garyzhao/ME-ADA. |
| Open Datasets | Yes | MNIST dataset [38] consists of handwritten digits with 60,000 training examples and 10,000 testing examples. Other digit datasets, including SVHN [45], MNIST-M [21], SYN [21] and USPS [14], are leveraged for evaluating model performance. ... PACS [39] is a recent dataset... CIFAR-10 and CIFAR-100 are two datasets [31] containing small 32 x 32 natural RGB images, both with 50,000 training images and 10,000 testing images. |
| Dataset Splits | Yes | For fair comparison, we follow the protocol in [39] including the recommended train, validation and test split. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, or cloud instance types) used to run its experiments. |
| Software Dependencies | No | The paper mentions optimizers (Adam, SGD) and neural network architectures (Le Net, Alex Net, All Convolutional Network, Dense Net, Wide Res Net, Res Ne Xt) by name, along with their respective citations. However, it does not provide specific version numbers for any software dependencies, such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries. |
| Experiment Setup | Yes | We use Adam [29] with α = 0.0001 for minimization and SGD with η = 1.0 for maximization. We set TMIN = 100, TMAX = 15, γ = 1.0, β = 10.0 and K = 3. ... We train all networks with an initial learning rate of 0.1 optimized by SGD using Nesterov momentum, and the learning rate decays following a cosine annealing schedule [42]. ... We train All Conv Net and Wide Res Net for 100 epochs; Dense Net and Res Ne Xt require 200 epochs for convergence. Following the setting of [25], we use a weight decay of 0.0001 for Dense Net and 0.0005 otherwise. |