Overfitting in adversarially robust deep learning

Authors: Leslie Rice, Eric Wong, Zico Kolter

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we empirically study this phenomenon in the setting of adversarially trained deep networks, which are trained to minimize the loss under worst-case adversarial perturbations. We find that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training across multiple datasets (SVHN, CIFAR-10, CIFAR-100, and Image Net) and perturbation models ( and 2).
Researcher Affiliation Academia 1Computer Science Department, Carnegie Mellon University, Pittsburgh PA, USA 2Machine Learning De partment, Carnegie Mellon University, Pittsburgh PA, USA. Correspondence to: Leslie Rice <larice@cs.cmu.edu>, Eric Wong <ericwong@cs.cmu.edu>.
Pseudocode No The paper describes algorithmic steps using mathematical notation within the text but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes All code for reproducing the experiments as well as pretrained model weights and training logs can be found at https://github.com/ locuslab/robust_overfitting.
Open Datasets Yes We find that overfitting to the training set does in fact harm robust performance to a very large de gree in adversarially robust training across mul tiple datasets (SVHN, CIFAR-10, CIFAR-100, and Image Net) and perturbation models ( and 2).
Dataset Splits Yes By holding out 1,000 examples from the CIFAR-10 training set for validation pur poses, we use validation-based early stopping to achieve 46.9% robust error on the test set without looking at the test set...
Hardware Specification Yes All experiments in this section were run with one Ge Force RTX 2080ti unless a Wide Res Net was trained, in which case two GPUs were used.
Software Dependencies No The paper mentions software like TRADES implementation and PyTorch (implied), and the MadryLab framework, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Specifically, when using the same architecture (a Wide Res Net with depth 28 and width factor 10) and the same 20-step PGD adversary for evaluation used by Zhang et al. (2019c) for TRADES... PGD-based adversarial training with a 10-step adversary with step size 2/255 using a pre-activation Res Net18 (He et al., 2016) (details for the training procedure and the PGD adversary can be found in Appendix D.1). ...The learning rate is decayed at 100 and 150 epochs.