Squeeze Training for Adversarial Robustness

Authors: Qizhang Li, Yiwen Guo, Wangmeng Zuo, Hao Chen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results verify the effectiveness of our method. We demonstrate that ST outperforms state-of-the-arts remarkably on several benchmark datasets, achieving an absolute robust accuracy gain of >+1.00% without utilizing additional data on CIFAR-10.
Researcher Affiliation Collaboration Qizhang Li1,2, Yiwen Guo3 , Wangmeng Zuo1 , Hao Chen4 1Harbin Institute of Technology, 2Tencent Security Big Data Lab, 3Independent Researcher, 4UC Davis
Pseudocode Yes Algorithm 1 Squeeze Training (ST)
Open Source Code Yes Code: https://github.com/qizhangli/ST-AT.
Open Datasets Yes Experiments are conducted on popular benchmark datasets, including CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), and SVHN (Netzer et al., 2011).
Dataset Splits No The paper mentions training on CIFAR-10, CIFAR-100, and SVHN and states, 'we select the model with the best PGD-20 performance from all checkpoints'. While this implies a validation process, it does not explicitly provide details about the validation dataset split (e.g., percentages, sample counts, or how it was created from the training data).
Hardware Specification Yes All models are trained on an NVIDIA Tesla-V100 GPU.
Software Dependencies No The paper mentions various methods and optimizers (e.g., SGD optimizer, Auto Attack) but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes In most experiments in this section, we perform adversarial training with a perturbation budget of ϵ = 8/255 and an inner step size α = 2/255, except for the SVHN dataset, where we use α = 1/255. In the training phase, we always use an SGD optimizer with a momentum of 0.9, a weight decay of 0.0005, and a batch size of 128. We train Res Net-18 (He et al., 2016a) for 120 epochs on CIFAR-10 and CIFAR-100, and we adopt an initial learning rate of 0.1 and cut it by 10 at the 80-th and 100-th epoch. For SVHN, we train Res Net-18 for 80 epochs with an initial learning rate of 0.01, and we cut by 10 at the 50-th and 65-th epoch. We adopt β = 6 for TRADES and β = 5 for MART by following their original papers. The final choice for the regularization function ℓreg and the scaling factor β in our ST will be given in Section 5.1. ... and we use β = 6 for CIFAR-10, β = 4 for CIFAR-100, and β = 8 for SVHN.