Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective
Authors: Yifei Wang, Liangchen Li, Jiansheng Yang, Zhouchen Lin, Yisen Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate this understanding with extensive experiments, and provide a holistic view of robust overfitting from the dynamics of both the two game players. This understanding further inspires us to alleviate robust overfitting by rebalancing the two players by either regularizing the trainer s capacity or improving the attack strength. Experiments show that the proposed Re Balanced Adversarial Training (Re BAT) can attain good robustness and does not suffer from robust overfitting even after very long training. |
| Researcher Affiliation | Academia | 1 School of Mathematical Sciences, Peking University 2 Department of Engineering, University of Cambridge 3 National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 4 Institute for Artificial Intelligence, Peking University 5 Peng Cheng Laboratory |
| Pseudocode | No | The paper describes methods textually and with equations, but does not provide any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | Yes | Code is available at https://github. com/PKU-ML/Re BAT. |
| Open Datasets | Yes | We consider the classification tasks on CIFAR-10, CIFAR-100 [23], and Tiny-Image Net [10] with the Pre Act Res Net-18 [17] and Wide Res Net-34-10 [57] architectures. The dataset contains 60,000 32 32 RGB images from 10 classes. For each class, there are 5,000 images for training and 1,000 images for evaluation. |
| Dataset Splits | Yes | Following Rice et al. [38], we hold out 1,000 images from the original CIFAR-10/100 training set, and similarly 2,000 images from the original Tiny-Image Net training set as validation sets. |
| Hardware Specification | Yes | Each model included in this paper is trained on a single NVIDIA Ge Force RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions using Python and related libraries but does not specify version numbers for libraries like PyTorch or TensorFlow, nor CUDA versions. |
| Experiment Setup | Yes | We use PGD-10 attack [30] with step size α = 2/255 and perturbation norm ε = 8/255 to craft adversarial examples on-the-fly. Following the settings in Madry et al. [30], we use SGD optimizer with momentum 0.9, weight decay 5 10 4 and batch size 128 to train the model for as many as 1,000 epochs. The learning rate (LR) is initially set to be 0.1 and decays to 0.01 at epoch 100 and further decays to 0.001 at epoch 150. |