Proper Network Interpretability Helps Adversarial Robustness in Classification
Authors: Akhilan Boopathy, Sijia Liu, Gaoyuan Zhang, Cynthia Liu, Pin-Yu Chen, Shiyu Chang, Luca Daniel
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we theoretically show that with a proper measurement of interpretation, it is actually difficult to prevent prediction-evasion adversarial attacks from causing interpretation discrepancy, as confirmed by experiments on MNIST, CIFAR-10 and Restricted Image Net. We empirically show that interpretability alone can be used to defend adversarial attacks for both misclassifcation and misinterpretation. |
| Researcher Affiliation | Collaboration | 1Massachusetts Institute of Technology 2MIT-IBM Watson AI Lab, IBM Research. |
| Pseudocode | No | The paper describes mathematical formulations and methods but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | Our codes are available at https://github.com/Akhilan B/ Proper-Interpretability |
| Open Datasets | Yes | We evaluate networks trained on the MNIST and CIFAR-10 datasets, and a Restricted Image Net (R-Image Net) dataset used in (Tsipras et al., 2019). |
| Dataset Splits | No | The paper uses standard datasets like MNIST, CIFAR-10, and Restricted Image Net but does not explicitly provide specific percentages, sample counts, or detailed methodologies for their train/validation/test splits. While it mentions evaluating on '200 random test set points', this does not define the full data partitioning for reproduction. |
| Hardware Specification | Yes | Training times are evaluated on a 2.60 GHz Intel Xeon CPU. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | Unless specified otherwise, we choose the perturbation size ϵ = 0.3 on MNIST, 8/255 on CIFAR and 0.003 for RImage Net for robust training under an ℓ perturbation norm. Also, we set the regularization parameter γ as 0.01 in (8); see a justification in Appendix F. |