reproducibilityindex.ai

Robust Overfitting may be mitigated by properly learned smoothening

Authors: Tianlong Chen, Zhenyu Zhang, Sijia Liu, Shiyu Chang, Zhangyang Wang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that by plugging in them to AT, we can simultaneously boost the standard accuracy by 3.72% 6.68% and robust accuracy by 0.22% 2.03%, across multiple datasets (STL-10, SVHN, CIFAR-10, CIFAR-100, and Tiny Image Net), perturbation types (ℓ and ℓ2), and robustiﬁed methods (PGD, TRADES, and FSGM), establishing the new state-of-the-art bar in AT.
Researcher Affiliation	Collaboration	1University of Texas at Austin,2University of Science and Technology of China 3Michigan State University, 4MIT-IBM Watson AI Lab, IBM Research
Pseudocode	No	The paper provides mathematical equations (e.g., Equation 2 for SWA) but no explicitly labeled pseudocode blocks or algorithms.
Open Source Code	Yes	Codes are available at https: //github.com/VITA-Group/Alleviate-Robust-Overfitting.
Open Datasets	Yes	Datasets We consider ﬁve datasets in our experiments: CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), SVHN (Netzer et al., 2011), STL-10 (Coates et al., 2011) and Tiny-Image Net (Deng et al., 2009).
Dataset Splits	Yes	In all experiments, we randomly split the original training set into one training and one validation sets with a 9:1 ratio.
Hardware Specification	No	The paper does not specify the exact hardware used for experiments, such as particular GPU or CPU models.
Software Dependencies	Yes	We use the ofﬁcial implementation and default settings for Auto-Attack (ℓ with ϵ = 8 255 and ℓ2 with ϵ = 128 255) and the implementation from Adver Torch (Ding et al., 2019) for CW attack with the same setting as Rony et al. (2019)
Experiment Setup	Yes	For training, we adopt an SGD optimizer with a momentum of 0.9 and weight decay of 5 10 4, for a total of 200 epochs, with a batch size of 128. The learning rate starts from 0.1 (0.01 for SVHN (Rice et al., 2020)), decay to one-tenth at epochs 50 and 150 respectively. For Tiny-Image Net, we train for 100 epochs, and the learning rate decay at epochs 50 and 80 with other settings unchanged.