Efficient Robust Training via Backward Smoothing
Authors: Jinghui Chen, Yu Cheng, Zhe Gan, Quanquan Gu, Jingjing Liu6222-6230
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on multiple benchmarks demonstrate that our method achieves similar model robustness as the original TRADES method while using much less training time ( 3x improvement with the same training schedule). In this section, we empirically evaluate the performance of our proposed method. |
| Researcher Affiliation | Collaboration | Jinghui Chen1, Yu Cheng2, Zhe Gan2, Quanquan Gu3, Jingjing Liu4 1 Pennsylvania State University 2 Microsoft Corporation 3 University of California, Los Angeles 4 Tsinghua University |
| Pseudocode | Yes | Algorithm 1: Backward Smoothing |
| Open Source Code | No | The paper does not provide explicit links to its own source code for the methodology. Footnotes mention external GitHub repositories (e.g., 'https://github.com/fra31/auto-attack' and 'https://github.com/ uclaml/Ray S.') which are tools used for evaluation, not the authors' implementation code. |
| Open Datasets | Yes | CIFAR-10, CIFAR-100 (Krizhevsky, Hinton et al. 2009) and Tiny Image Net (Deng et al. 2009) datasets. |
| Dataset Splits | Yes | We set ϵ = 0.031 for all three datasets. In terms of model architecture, we adopt standard Res Net-18 model (He et al. 2016) for both CIFAR-10 and CIFAR-100 datasets, and Res Net50 model for Tiny Image Net. We follow the standard piecewise learning rate decay schedule as used in (Madry et al. 2018; Zhang et al. 2019) and set decaying point at 50-th and 75-th epochs. The starting learning rate for all methods is set to 0.1, the same as previous work (Madry et al. 2018; Zhang et al. 2019). For all methods, we tune the models for their best robustness performances for a fair comparison. |
| Hardware Specification | Yes | All the experiments are conducted on RTX2080Ti GPU servers. |
| Software Dependencies | No | The paper mentions general techniques like 'cyclic learning rate decay schedule (Smith 2017)' and 'mix-precision training (Micikevicius et al. 2018)' but does not specify software dependencies with version numbers (e.g., PyTorch 1.9, CUDA 11.1). |
| Experiment Setup | Yes | Following previous work on robust training (Madry et al. 2018; Zhang et al. 2019; Wong, Rice, and Kolter 2020), we set ϵ = 0.031 for all three datasets. In terms of model architecture, we adopt standard Res Net-18 model (He et al. 2016) for both CIFAR-10 and CIFAR-100 datasets, and Res Net50 model for Tiny Image Net. We follow the standard piecewise learning rate decay schedule as used in (Madry et al. 2018; Zhang et al. 2019) and set decaying point at 50-th and 75-th epochs. The starting learning rate for all methods is set to 0.1, the same as previous work (Madry et al. 2018; Zhang et al. 2019). For Adversarial Training and TRADES methods, we adopt a 10-step iterative PGD attack with a step size of 2/255 for both. For our proposed method, we set the backward smoothing parameter γ = 1 and step size as 8/255. For other fast training methods, we use a step size of 10/255 for Fast AT/Grad Align, 6/255 for 2-step Fast AT, 6/255 for Fast TRADES and 5/255 for 2-step Fast TRADES. |