Squeeze Training for Adversarial Robustness
Authors: Qizhang Li, Yiwen Guo, Wangmeng Zuo, Hao Chen
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results verify the effectiveness of our method. We demonstrate that ST outperforms state-of-the-arts remarkably on several benchmark datasets, achieving an absolute robust accuracy gain of >+1.00% without utilizing additional data on CIFAR-10. |
| Researcher Affiliation | Collaboration | Qizhang Li1,2, Yiwen Guo3 , Wangmeng Zuo1 , Hao Chen4 1Harbin Institute of Technology, 2Tencent Security Big Data Lab, 3Independent Researcher, 4UC Davis |
| Pseudocode | Yes | Algorithm 1 Squeeze Training (ST) |
| Open Source Code | Yes | Code: https://github.com/qizhangli/ST-AT. |
| Open Datasets | Yes | Experiments are conducted on popular benchmark datasets, including CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), and SVHN (Netzer et al., 2011). |
| Dataset Splits | No | The paper mentions training on CIFAR-10, CIFAR-100, and SVHN and states, 'we select the model with the best PGD-20 performance from all checkpoints'. While this implies a validation process, it does not explicitly provide details about the validation dataset split (e.g., percentages, sample counts, or how it was created from the training data). |
| Hardware Specification | Yes | All models are trained on an NVIDIA Tesla-V100 GPU. |
| Software Dependencies | No | The paper mentions various methods and optimizers (e.g., SGD optimizer, Auto Attack) but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | In most experiments in this section, we perform adversarial training with a perturbation budget of ϵ = 8/255 and an inner step size α = 2/255, except for the SVHN dataset, where we use α = 1/255. In the training phase, we always use an SGD optimizer with a momentum of 0.9, a weight decay of 0.0005, and a batch size of 128. We train Res Net-18 (He et al., 2016a) for 120 epochs on CIFAR-10 and CIFAR-100, and we adopt an initial learning rate of 0.1 and cut it by 10 at the 80-th and 100-th epoch. For SVHN, we train Res Net-18 for 80 epochs with an initial learning rate of 0.01, and we cut by 10 at the 50-th and 65-th epoch. We adopt β = 6 for TRADES and β = 5 for MART by following their original papers. The final choice for the regularization function ℓreg and the scaling factor β in our ST will be given in Section 5.1. ... and we use β = 6 for CIFAR-10, β = 4 for CIFAR-100, and β = 8 for SVHN. |