Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness
Authors: Tianlong Chen, Huan Zhang, Zhenyu Zhang, Shiyu Chang, Sijia Liu, Pin-Yu Chen, Zhangyang Wang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments. Datasets and architectures. Our experiments are conducted on three representative datasets in adversarial robustness and verification literature, MNIST (Deng, 2012), SVHN (Netzer et al., 2011) and CIFAR-10 (Krizhevsky & Hinton, 2009). |
| Researcher Affiliation | Collaboration | 1University of Texas at Austin 2Carnegie Mellon University 3University of California, Santa Barbara 4Michigan State University 5MIT-IBM Watson AI Lab 6IBM Research. |
| Pseudocode | No | The paper describes procedures in text and figures (e.g., Figure 1), but it does not contain any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes are available at https://github.com/ VITA-Group/Linearity-Grafting. |
| Open Datasets | Yes | Our experiments are conducted on three representative datasets in adversarial robustness and verification literature, MNIST (Deng, 2012), SVHN (Netzer et al., 2011) and CIFAR-10 (Krizhevsky & Hinton, 2009). |
| Dataset Splits | No | The paper discusses evaluation on “test sets” and notes that “VA is computed on the first 1, 000 images,” but it does not provide specific train/validation/test dataset splits (e.g., percentages or counts) for reproducibility. |
| Hardware Specification | No | OOM indicates that DNNs have too many unstable neurons and the verifier is unable to load it with 48 GB GPU memory, leading to verification time and a null VA ( ). |
| Software Dependencies | No | The paper mentions using an SGD optimizer and cosine annealing schedule, but it does not specify any software names with their version numbers (e.g., PyTorch version, CUDA version). |
| Experiment Setup | Yes | For fast adversarial training (Wong et al., 2020), we adopt the effective Grad Align regularization (Andriushchenko & Flammarion, 2020) with a coefficient of 0.2, for all 200 training epochs. The learning rate starts from 0.1 and decays by ten times at epochs 100 and 150, while the batch size is 128. We use an SGD optimizer with 0.9 momentum and 5 10 4 weight decay. During the finetuning of grafted networks, an initial learning rate of 0.01 is used for trainable slopes and intercept (a, b) of grafted neurons, and 0.001 for original model parameters. And the learning rate decays with a cosine annealing schedule of 100 training epochs. |