Learning Diverse-Structured Networks for Adversarial Robustness
Authors: Xuefeng Du, Jingfeng Zhang, Bo Han, Tongliang Liu, Yu Rong, Gang Niu, Junzhou Huang, Masashi Sugiyama
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate the advantages of DSNet, i.e., weighting the atomic blocks. In this section, we present empirical evidence to validate DS-Net on benchmarks with three AT styles. |
| Researcher Affiliation | Collaboration | 1Hong Kong Baptist University 2University of Wisconsin-Madison 3RIKEN 4University of Sydney 5Tencent AI Lab 6University of Tokyo. |
| Pseudocode | Yes | Algorithm 1 Diverse-Structured Network. Input: input data x X with label y Y, model f W with block parameters W, loss function ℓ, maximum PGD steps K, perturbation bound ε, step size α, and randomly initialized attention weights w. Output: learned model f W and attention weights w. while not eval do Step 1: Fix w and W, generate x by Eq. (7). Step 2: Update w and W by Eq. (8). end while eval do Step 3: Fix w and W, generate x by Eq. (7). Step 4: Calculate output by Eq. (6) and report accuracy. end |
| Open Source Code | Yes | The code is available at https://github.com/d12306/dsnet. |
| Open Datasets | Yes | We evaluated DS-Net on CIFAR-10 and SVHN |
| Dataset Splits | Yes | We have tried to use 1,000 images from the training set as validation set to determine the stopping point, which aligns with our selection point. |
| Hardware Specification | Yes | We trained on one Tesla V100 |
| Software Dependencies | No | The paper mentions using 'apex' for mixed-precision acceleration but does not provide specific version numbers for this or any other software dependencies like deep learning frameworks or programming languages. |
| Experiment Setup | Yes | For CIFAR-10, during training, we set the perturbation bound ε to 0.031 and step size α to 0.007 with 10 steps. We used SGD optimizer with a momentum of 0.9 and weight decay of 5e-4. The initial learning rate is 0.1. We trained for 120 epochs for standard AT and the learning rate is multiplied by 0.1 and 0.01 at epoch 60 and 90. For TRADES, we trained for 85 epochs and the learning rate is multiplied by 0.1 at epoch 75. We tested the performance when the model is trained with regularization factor β = 1 and β = 6. For MART, we trained for 90 epochs and the learning rate is multiplied by 0.1 at epoch 60. We set β = 6. The batch size is set to 128. For SVHN, the step size is set to 0.003 with ε = 0.031. The training epochs including the epoch for learning rate decay is reduced by 20 for AT, TRADES and MART. We select all models 1 epoch after the 1st learning rate decay point following Rice et al. (2020) because robust overfitting also happens for DS-Net. We have tried to use 1,000 images from the training set as validation set to determine the stopping point, which aligns with our selection point. We used Adam optimizer (Kingma & Ba, 2014) with a learning rate of 1e-3 and a weight decay of 1e-3 to optimize the attention weights, which is then normalized by softmax function. The comparison with using other optimizers is shown in Appendix E. We set the number of layers to 15 and the initial channel number to 20. We used two residual layers at the 1/3 of the total depth of the DS-Net to increase the channels by a factor of k and 2, respectively. Meanwhile, the spatial size of the feature map is reduced by a half. We set k = 4/6 and obtain a small and large DS-Net in our experiments, denoted as DS-Net-4/6-softmax. |