Doubly Robust Instance-Reweighted Adversarial Training

Authors: Daouda Sow, Sen Lin, Zhangyang Wang, Yingbin Liang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on standard classification datasets demonstrate that our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance, and at the same time improves the robustness against attacks on the weakest data points. 4 EXPERIMENTS
Researcher Affiliation Academia Daouda A. Sow Department of ECE The Ohio State University sow.53@osu.edu Sen Lin Department of CS University of Houston slin50@central.uh.edu Zhangyang Wang Visual Informatics Group University of Texas at Austin atlaswang@utexas.edu Yingbin Liang Department of ECE The Ohio State University liang.889@osu.edu
Pseudocode Yes Algorithm 1 Compositional Implicit Differentiation (CID)
Open Source Code Yes Pytorch codes for our method are provided in the supplementary material of our submission.
Open Datasets Yes We consider image classification problems and compare the performance of the baselines on four datasets: CIFAR10 Krizhevsky & Hinton (2009), SVHN Netzer et al. (2011), STL10 Coates et al. (2011), and GTSRB Stallkamp et al. (2012).
Dataset Splits Yes All hyperparameters were fixed by holding out 10% of the training data as a validation set and selecting the values that achieve the best performance on the validation set. ...For CIFAR10, SVHN, and STL10 we use the training and test splits provided by Torchvision.
Hardware Specification Yes We run all baselines on a single NVIDIA Tesla V100 GPU.
Software Dependencies Yes All codes are tested with Python 3.7 and Pytorch 1.8.
Experiment Setup Yes More details about the training and hyperparameters search can be found in Appendix B. ...we train our baselines using stochastic gradient descent with a minibtach size of 128 and a momentum of 0.9. We use Res Net-18 as the backbone network as in Madry et al. (2017) and train our baselines for 60 epochs with a cyclic learning rate schedule where the maximum learning rate is set to 0.2 ...For the KL-divergence regularization parameter r in our algorithms, we use a decayed schedule where we initially set it to 10 and decay it to 1 and 0.1, respectively at epochs 40 and 50 (see fig. 2).