Robust Overfitting may be mitigated by properly learned smoothening
Authors: Tianlong Chen, Zhenyu Zhang, Sijia Liu, Shiyu Chang, Zhangyang Wang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that by plugging in them to AT, we can simultaneously boost the standard accuracy by 3.72% 6.68% and robust accuracy by 0.22% 2.03%, across multiple datasets (STL-10, SVHN, CIFAR-10, CIFAR-100, and Tiny Image Net), perturbation types (ℓ and ℓ2), and robustified methods (PGD, TRADES, and FSGM), establishing the new state-of-the-art bar in AT. |
| Researcher Affiliation | Collaboration | 1University of Texas at Austin,2University of Science and Technology of China 3Michigan State University, 4MIT-IBM Watson AI Lab, IBM Research |
| Pseudocode | No | The paper provides mathematical equations (e.g., Equation 2 for SWA) but no explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | Codes are available at https: //github.com/VITA-Group/Alleviate-Robust-Overfitting. |
| Open Datasets | Yes | Datasets We consider five datasets in our experiments: CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), SVHN (Netzer et al., 2011), STL-10 (Coates et al., 2011) and Tiny-Image Net (Deng et al., 2009). |
| Dataset Splits | Yes | In all experiments, we randomly split the original training set into one training and one validation sets with a 9:1 ratio. |
| Hardware Specification | No | The paper does not specify the exact hardware used for experiments, such as particular GPU or CPU models. |
| Software Dependencies | Yes | We use the official implementation and default settings for Auto-Attack (ℓ with ϵ = 8 255 and ℓ2 with ϵ = 128 255) and the implementation from Adver Torch (Ding et al., 2019) for CW attack with the same setting as Rony et al. (2019) |
| Experiment Setup | Yes | For training, we adopt an SGD optimizer with a momentum of 0.9 and weight decay of 5 10 4, for a total of 200 epochs, with a batch size of 128. The learning rate starts from 0.1 (0.01 for SVHN (Rice et al., 2020)), decay to one-tenth at epochs 50 and 150 respectively. For Tiny-Image Net, we train for 100 epochs, and the learning rate decay at epochs 50 and 80 with other settings unchanged. |