Do Wider Neural Networks Really Help Adversarial Robustness?

Authors: Boxi Wu, Jinghui Chen, Deng Cai, Xiaofei He, Quanquan Gu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we carefully examine the relationship between network width and model robustness. Specifically, we show that the model robustness is closely related to the tradeoff between natural accuracy and perturbation stability, which is controlled by the robust regularization parameter λ. With the same λ, wider networks can achieve better natural accuracy but worse perturbation stability, leading to a potentially worse overall model robustness.
Researcher Affiliation Academia Boxi Wu State Key Lab of CAD&CG Zhejiang University boxiwu@zju.edu.cn Jinghui Chen Pennsylvania State University State College, PA 16801 jzc5917@psu.edu Deng Cai State Key Lab of CAD&CG Zhejiang University dengcai@cad.zju.edu.cn Xiaofei He State Key Lab of CAD&CG Zhejiang University xiaofeihe@cad.zju.edu.cn Quanquan Gu Dept. of Computer Science UCLA qgu@cs.ucla.edu
Pseudocode Yes Algorithm 1 Width Adjusted Regularization
Open Source Code Yes (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes We conduct our experiments on CIFAR10 [37] dataset, which is the most popular dataset in the adversarial training literature. It contains images from 10 different categories, with 50k images for training and 10k for testing.
Dataset Splits No The paper states that the CIFAR10 dataset contains "50k images for training and 10k for testing" but does not explicitly mention a separate validation set split or its size/percentage.
Hardware Specification Yes All experiments are conducted on a single NVIDIA V100 GPU.
Software Dependencies No The paper mentions using Python implicitly by providing pseudocode for an algorithm, but it does not specify any software names with version numbers for libraries, frameworks, or other dependencies (e.g., PyTorch version, TensorFlow version, CUDA version).
Experiment Setup Yes The batch size is set to 128, and we train each model for 100 epochs. The initial learning rate is set to be 0.1. We adopt a slightly different learning rate decay schedule: instead of dividing the learning rate by 10 after 75-th epoch and 90-th epoch as in [41, 70, 63], we halve the learning rate for every epoch after the 75-th epoch, for the purpose of preventing over-fitting. For evaluating the model robustness, we perform the standard PGD attack [41] using 20 steps with step size 0.007, and ϵ = 8/255.