Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness

Authors: Tianlong Chen, Huan Zhang, Zhenyu Zhang, Shiyu Chang, Sijia Liu, Pin-Yu Chen, Zhangyang Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments. Datasets and architectures. Our experiments are conducted on three representative datasets in adversarial robustness and verification literature, MNIST (Deng, 2012), SVHN (Netzer et al., 2011) and CIFAR-10 (Krizhevsky & Hinton, 2009).
Researcher Affiliation Collaboration 1University of Texas at Austin 2Carnegie Mellon University 3University of California, Santa Barbara 4Michigan State University 5MIT-IBM Watson AI Lab 6IBM Research.
Pseudocode No The paper describes procedures in text and figures (e.g., Figure 1), but it does not contain any formal pseudocode or algorithm blocks.
Open Source Code Yes Codes are available at https://github.com/ VITA-Group/Linearity-Grafting.
Open Datasets Yes Our experiments are conducted on three representative datasets in adversarial robustness and verification literature, MNIST (Deng, 2012), SVHN (Netzer et al., 2011) and CIFAR-10 (Krizhevsky & Hinton, 2009).
Dataset Splits No The paper discusses evaluation on “test sets” and notes that “VA is computed on the first 1, 000 images,” but it does not provide specific train/validation/test dataset splits (e.g., percentages or counts) for reproducibility.
Hardware Specification No OOM indicates that DNNs have too many unstable neurons and the verifier is unable to load it with 48 GB GPU memory, leading to verification time and a null VA ( ).
Software Dependencies No The paper mentions using an SGD optimizer and cosine annealing schedule, but it does not specify any software names with their version numbers (e.g., PyTorch version, CUDA version).
Experiment Setup Yes For fast adversarial training (Wong et al., 2020), we adopt the effective Grad Align regularization (Andriushchenko & Flammarion, 2020) with a coefficient of 0.2, for all 200 training epochs. The learning rate starts from 0.1 and decays by ten times at epochs 100 and 150, while the batch size is 128. We use an SGD optimizer with 0.9 momentum and 5 10 4 weight decay. During the finetuning of grafted networks, an initial learning rate of 0.01 is used for trainable slopes and intercept (a, b) of grafted neurons, and 0.001 for original model parameters. And the learning rate decays with a cosine annealing schedule of 100 training epochs.