reproducibilityindex.ai

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Authors: Hanxun Huang, Yisen Wang, Sarah Erfani, Quanquan Gu, James Bailey, Xingjun Ma

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we address this gap via a comprehensive investigation on the impact of network width and depth on the robustness of adversarially trained DNNs. Speciﬁcally, we make the following key observations: 1) more parameters (higher model capacity) does not necessarily help adversarial robustness; 2) reducing capacity at the last stage (the last group of blocks) of the network can actually improve adversarial robustness; and 3) under the same parameter budget, there exists an optimal architectural conﬁguration for adversarial robustness. We also provide a theoretical analysis explaning why such network conﬁguration can help robustness. Our exploration of the relationship between DNN architectural conﬁguration, Lipschitzness (size of the Lipschitz constant) and adversarial robustnessare starts with a ﬁne-controlled grid search on the width/depth of the Wide Res Net (WRN) [27].
Researcher Affiliation	Academia	1School of Computing and Information Systems, The University of Melbourne, Victoria, Australia 2Key Lab. of Machine Perception, School of Artiﬁcial Intelligence, Peking University, Beijing, China 3Institute for Artiﬁcial Intelligence, Peking University, Beijing, China 4University of California, Los Angeles, USA 5School of Computer Science, Fudan University, Shanghai, China
Pseudocode	No	The paper contains theoretical results and proofs in Appendix A.1, but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/Hanxun H/Robust WRN.
Open Datasets	Yes	We train all explored networks on CIFAR-10 dataset [1] using the standard adversarial training (SAT) with Projected Gradient Descent (PGD) [15] (see deﬁnition in equation (2)).
Dataset Splits	No	The paper mentions training and testing on CIFAR-10, but it does not explicitly provide details about a validation dataset split or how a validation set was used beyond general evaluation.
Hardware Specification	Yes	All experiments are conducted on NVIDIA Quadro RTX 8000 (48GB) and NVIDIA GeForce RTX 2080 Ti (11GB) GPUs.
Software Dependencies	No	The paper mentions using 'SGD optimizer', 'Batch normalization [82]', and 'cosine annealing [81]', but does not specify version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	All models are trained for 100 epochs using SGD optimizer with momentum 0.9 and weight decay 5e-4. Learning rate is initialized to 0.1 and decayed by 0.1 at epoch 75 and 90. A batch size of 128 is used. The learning rate scheduler is cosine annealing [81]. Batch normalization [82] is applied after each convolution layer. We constrain the L -norm of the maximum adversarial perturbation to ϵ = 8/255, and use 10-step PGD (PGD10) with step size α = 2/255. For evaluation, we use the 20-step PGD (PGD20) with step size α = ϵ/10.