Robust Overfitting may be mitigated by properly learned smoothening

Authors: Tianlong Chen, Zhenyu Zhang, Sijia Liu, Shiyu Chang, Zhangyang Wang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that by plugging in them to AT, we can simultaneously boost the standard accuracy by 3.72% 6.68% and robust accuracy by 0.22% 2.03%, across multiple datasets (STL-10, SVHN, CIFAR-10, CIFAR-100, and Tiny Image Net), perturbation types (ℓ and ℓ2), and robustified methods (PGD, TRADES, and FSGM), establishing the new state-of-the-art bar in AT.
Researcher Affiliation Collaboration 1University of Texas at Austin,2University of Science and Technology of China 3Michigan State University, 4MIT-IBM Watson AI Lab, IBM Research
Pseudocode No The paper provides mathematical equations (e.g., Equation 2 for SWA) but no explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes Codes are available at https: //github.com/VITA-Group/Alleviate-Robust-Overfitting.
Open Datasets Yes Datasets We consider five datasets in our experiments: CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), SVHN (Netzer et al., 2011), STL-10 (Coates et al., 2011) and Tiny-Image Net (Deng et al., 2009).
Dataset Splits Yes In all experiments, we randomly split the original training set into one training and one validation sets with a 9:1 ratio.
Hardware Specification No The paper does not specify the exact hardware used for experiments, such as particular GPU or CPU models.
Software Dependencies Yes We use the official implementation and default settings for Auto-Attack (ℓ with ϵ = 8 255 and ℓ2 with ϵ = 128 255) and the implementation from Adver Torch (Ding et al., 2019) for CW attack with the same setting as Rony et al. (2019)
Experiment Setup Yes For training, we adopt an SGD optimizer with a momentum of 0.9 and weight decay of 5 10 4, for a total of 200 epochs, with a batch size of 128. The learning rate starts from 0.1 (0.01 for SVHN (Rice et al., 2020)), decay to one-tenth at epochs 50 and 150 respectively. For Tiny-Image Net, we train for 100 epochs, and the learning rate decay at epochs 50 and 80 with other settings unchanged.