Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization
Authors: Yihua Zhang, Guanhua Zhang, Prashant Khanduri, Mingyi Hong, Shiyu Chang, Sijia Liu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In practice, we show our method yields substantial robustness improvements over baselines across multiple models and datasets. All experiments are run on a single Ge Force RTX 3090 GPU. |
| Researcher Affiliation | Collaboration | 1 Michigan State University 2 UC Santa Barbara 3 University of Minnesota 4MIT-IBM Watson AI Lab, IBM Research. |
| Pseudocode | Yes | FAST-BAT algorithm Lower-level solution: Obtain δ (θt) from (8); δ (θ) = PC (z (1/λ) δ=zℓatk(θ, δ)) . Upper-level model training: Integrating the IG (12) into (4), call SGD to update the model parameters as: θt+1 = θt α1 θℓtr(θt, δ ) λ) θδℓatk(θt, δ )HC δℓtr(θt, δ ), (13) where α1, α2 > 0 are learning rates associated with the model gradient and the IG-augmented descent direction. |
| Open Source Code | Yes | Codes are available at https:// github.com/OPTML-Group/Fast-BAT. |
| Open Datasets | Yes | We will evaluate the effectiveness of our proposal under CIFAR-10 (Krizhevsky & Hinton, 2009), CIFAR-100 (Krizhevsky & Hinton, 2009), Tiny-Image Net (Deng et al., 2009), and Image Net (Deng et al., 2009). |
| Dataset Splits | Yes | Minor performance degradation compared to the results reported in the original papers is due to different training setting, that we train all the methods with only 90% of training data, choose the best model on the validation set (10% training data), and evaluate on the test set. |
| Hardware Specification | Yes | All experiments are run on a single Ge Force RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions software components like 'SGD optimizer' and 'cyclic scheduler' and implies the use of frameworks (e.g., PyTorch from typical machine learning setups), but it does not specify any version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | Training details. We choose the training perturbation strength = {8, 16}/255 for CIFAR-10, CIFAR-100, and Tiny-Image Net; and = 2/255 for Image Net following (Wong et al., 2020; Andriushchenko & Flammarion, 2020). Throughout the experiments, we utilize an SGD optimizer with a momentum of 0.9 and weight decay of 5 10 4. For CIFAR-10, CIFAR-100 and Tiny-Image Net, we train each model for 20 epochs in total, where we use the cyclic scheduler to adjust the learning rate. The learning rate linearly ascends from 0 to 0.2 within the first 10 epochs and then reduces to 0 within the last 10 epochs. Our batch size is set to 128 for all settings. In the implementation of FAST-BAT, we follow the dataset-agnostic hyperparameter scheme for , such that = 255/5000 for = 8/255 and = 255/2500 for = 16/255 for CIFAR-10, CIFAR-100 and Tiny-Image Net. For Image Net, we strictly follow the setup given by (Wong et al., 2020) and we choose the train-time attack budget as = 2/255. |