reproducibilityindex.ai

Understanding the Impact of Adversarial Robustness on Accuracy Disparity

Authors: Yuzheng Hu, Fan Wu, Hongyang Zhang, Han Zhao

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We additionally perform experiments on both synthetic and real-world datasets to corroborate our theoretical findings.
Researcher Affiliation	Academia	1Department of Computer Science, University of Illinois at Urbana Champaign, Urbana, IL, USA 2David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada.
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is publicly available on Git Hub1. 1https://github.com/Accuracy-Disparity/AT-on-AD
Open Datasets	Yes	For the real-world datasets MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky et al., 2009). Results on additional datasets, including two synthetic datasets featuring stable distributions (Cauchy and Holtsmark), as well as two more real-world datasets, Fashion MNIST (Xiao et al., 2017) and Image Net (Deng et al., 2009).
Dataset Splits	Yes	For each dataset in each dataset group, we split the dataset into three disjoint partitions: training, validation, and testing. For the synthetic dataset, we set the ratio of the three partitions to be 8:1:1, which gives us a total number of 8000 training samples, 1000 validation samples, and 1000 testing samples for the majority class in each dataset. ... For the real-world datasets MNIST, Fashion-MNIST and CIFAR, since the datasets are originally split into training and testing, we further split the training set into training and validation with a ratio of 8:1.
Hardware Specification	Yes	We perform experiments on a machine with AMD EPYC 7352 24-Core Processor CPU and 8 NVIDIA RTX A6000 GPUs.
Software Dependencies	No	The paper mentions optimizers (SGD, Adam) and methods (FGM, PGD) but does not list specific software libraries or frameworks with their version numbers (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	We perform a grid search for the hyper-parameters including learning rate, batch size, and hidden layer size (when applicable) based on the model s performance on the validation set. The search space for learning rate is {0.1, 0.05, 0.01, 0.005, 0.001, 0.0005, 0.0002, 0.0001}, for batch size is {32, 64, 128}, and for hidden layer size is {100, 200, 500, 1000, 2000}. For each model, we perform training for a maximum of 500 training epochs; we keep track of the best model throughout the training based on the validation loss and apply early stopping (Prechelt, 1998) when the lowest validation loss does not decrease for the past 50 epochs. ... For PGD attack specifically, we set the step number to be 50 and the limit on the per step size to be 2.5 ε/50 following Madry et al. (Madry et al., 2018).