Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Authors: Hanxun Huang, Yisen Wang, Sarah Erfani, Quanquan Gu, James Bailey, Xingjun Ma

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we address this gap via a comprehensive investigation on the impact of network width and depth on the robustness of adversarially trained DNNs. Specifically, we make the following key observations: 1) more parameters (higher model capacity) does not necessarily help adversarial robustness; 2) reducing capacity at the last stage (the last group of blocks) of the network can actually improve adversarial robustness; and 3) under the same parameter budget, there exists an optimal architectural configuration for adversarial robustness. We also provide a theoretical analysis explaning why such network configuration can help robustness. Our exploration of the relationship between DNN architectural configuration, Lipschitzness (size of the Lipschitz constant) and adversarial robustnessare starts with a fine-controlled grid search on the width/depth of the Wide Res Net (WRN) [27].
Researcher Affiliation Academia 1School of Computing and Information Systems, The University of Melbourne, Victoria, Australia 2Key Lab. of Machine Perception, School of Artificial Intelligence, Peking University, Beijing, China 3Institute for Artificial Intelligence, Peking University, Beijing, China 4University of California, Los Angeles, USA 5School of Computer Science, Fudan University, Shanghai, China
Pseudocode No The paper contains theoretical results and proofs in Appendix A.1, but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Hanxun H/Robust WRN.
Open Datasets Yes We train all explored networks on CIFAR-10 dataset [1] using the standard adversarial training (SAT) with Projected Gradient Descent (PGD) [15] (see definition in equation (2)).
Dataset Splits No The paper mentions training and testing on CIFAR-10, but it does not explicitly provide details about a validation dataset split or how a validation set was used beyond general evaluation.
Hardware Specification Yes All experiments are conducted on NVIDIA Quadro RTX 8000 (48GB) and NVIDIA GeForce RTX 2080 Ti (11GB) GPUs.
Software Dependencies No The paper mentions using 'SGD optimizer', 'Batch normalization [82]', and 'cosine annealing [81]', but does not specify version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes All models are trained for 100 epochs using SGD optimizer with momentum 0.9 and weight decay 5e-4. Learning rate is initialized to 0.1 and decayed by 0.1 at epoch 75 and 90. A batch size of 128 is used. The learning rate scheduler is cosine annealing [81]. Batch normalization [82] is applied after each convolution layer. We constrain the L -norm of the maximum adversarial perturbation to ϵ = 8/255, and use 10-step PGD (PGD10) with step size α = 2/255. For evaluation, we use the 20-step PGD (PGD20) with step size α = ϵ/10.