Understanding the Impact of Adversarial Robustness on Accuracy Disparity
Authors: Yuzheng Hu, Fan Wu, Hongyang Zhang, Han Zhao
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We additionally perform experiments on both synthetic and real-world datasets to corroborate our theoretical findings. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Illinois at Urbana Champaign, Urbana, IL, USA 2David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada. |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is publicly available on Git Hub1. 1https://github.com/Accuracy-Disparity/AT-on-AD |
| Open Datasets | Yes | For the real-world datasets MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky et al., 2009). Results on additional datasets, including two synthetic datasets featuring stable distributions (Cauchy and Holtsmark), as well as two more real-world datasets, Fashion MNIST (Xiao et al., 2017) and Image Net (Deng et al., 2009). |
| Dataset Splits | Yes | For each dataset in each dataset group, we split the dataset into three disjoint partitions: training, validation, and testing. For the synthetic dataset, we set the ratio of the three partitions to be 8:1:1, which gives us a total number of 8000 training samples, 1000 validation samples, and 1000 testing samples for the majority class in each dataset. ... For the real-world datasets MNIST, Fashion-MNIST and CIFAR, since the datasets are originally split into training and testing, we further split the training set into training and validation with a ratio of 8:1. |
| Hardware Specification | Yes | We perform experiments on a machine with AMD EPYC 7352 24-Core Processor CPU and 8 NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions optimizers (SGD, Adam) and methods (FGM, PGD) but does not list specific software libraries or frameworks with their version numbers (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | We perform a grid search for the hyper-parameters including learning rate, batch size, and hidden layer size (when applicable) based on the model s performance on the validation set. The search space for learning rate is {0.1, 0.05, 0.01, 0.005, 0.001, 0.0005, 0.0002, 0.0001}, for batch size is {32, 64, 128}, and for hidden layer size is {100, 200, 500, 1000, 2000}. For each model, we perform training for a maximum of 500 training epochs; we keep track of the best model throughout the training based on the validation loss and apply early stopping (Prechelt, 1998) when the lowest validation loss does not decrease for the past 50 epochs. ... For PGD attack specifically, we set the step number to be 50 and the limit on the per step size to be 2.5 ε/50 following Madry et al. (Madry et al., 2018). |