Efficient Robust Training via Backward Smoothing

Authors: Jinghui Chen, Yu Cheng, Zhe Gan, Quanquan Gu, Jingjing Liu6222-6230

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on multiple benchmarks demonstrate that our method achieves similar model robustness as the original TRADES method while using much less training time ( 3x improvement with the same training schedule). In this section, we empirically evaluate the performance of our proposed method.
Researcher Affiliation Collaboration Jinghui Chen1, Yu Cheng2, Zhe Gan2, Quanquan Gu3, Jingjing Liu4 1 Pennsylvania State University 2 Microsoft Corporation 3 University of California, Los Angeles 4 Tsinghua University
Pseudocode Yes Algorithm 1: Backward Smoothing
Open Source Code No The paper does not provide explicit links to its own source code for the methodology. Footnotes mention external GitHub repositories (e.g., 'https://github.com/fra31/auto-attack' and 'https://github.com/ uclaml/Ray S.') which are tools used for evaluation, not the authors' implementation code.
Open Datasets Yes CIFAR-10, CIFAR-100 (Krizhevsky, Hinton et al. 2009) and Tiny Image Net (Deng et al. 2009) datasets.
Dataset Splits Yes We set ϵ = 0.031 for all three datasets. In terms of model architecture, we adopt standard Res Net-18 model (He et al. 2016) for both CIFAR-10 and CIFAR-100 datasets, and Res Net50 model for Tiny Image Net. We follow the standard piecewise learning rate decay schedule as used in (Madry et al. 2018; Zhang et al. 2019) and set decaying point at 50-th and 75-th epochs. The starting learning rate for all methods is set to 0.1, the same as previous work (Madry et al. 2018; Zhang et al. 2019). For all methods, we tune the models for their best robustness performances for a fair comparison.
Hardware Specification Yes All the experiments are conducted on RTX2080Ti GPU servers.
Software Dependencies No The paper mentions general techniques like 'cyclic learning rate decay schedule (Smith 2017)' and 'mix-precision training (Micikevicius et al. 2018)' but does not specify software dependencies with version numbers (e.g., PyTorch 1.9, CUDA 11.1).
Experiment Setup Yes Following previous work on robust training (Madry et al. 2018; Zhang et al. 2019; Wong, Rice, and Kolter 2020), we set ϵ = 0.031 for all three datasets. In terms of model architecture, we adopt standard Res Net-18 model (He et al. 2016) for both CIFAR-10 and CIFAR-100 datasets, and Res Net50 model for Tiny Image Net. We follow the standard piecewise learning rate decay schedule as used in (Madry et al. 2018; Zhang et al. 2019) and set decaying point at 50-th and 75-th epochs. The starting learning rate for all methods is set to 0.1, the same as previous work (Madry et al. 2018; Zhang et al. 2019). For Adversarial Training and TRADES methods, we adopt a 10-step iterative PGD attack with a step size of 2/255 for both. For our proposed method, we set the backward smoothing parameter γ = 1 and step size as 8/255. For other fast training methods, we use a step size of 10/255 for Fast AT/Grad Align, 6/255 for 2-step Fast AT, 6/255 for Fast TRADES and 5/255 for 2-step Fast TRADES.