Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Proper Network Interpretability Helps Adversarial Robustness in Classification
Authors: Akhilan Boopathy, Sijia Liu, Gaoyuan Zhang, Cynthia Liu, Pin-Yu Chen, Shiyu Chang, Luca Daniel
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we theoretically show that with a proper measurement of interpretation, it is actually difficult to prevent prediction-evasion adversarial attacks from causing interpretation discrepancy, as confirmed by experiments on MNIST, CIFAR-10 and Restricted Image Net. We empirically show that interpretability alone can be used to defend adversarial attacks for both misclassifcation and misinterpretation. |
| Researcher Affiliation | Collaboration | 1Massachusetts Institute of Technology 2MIT-IBM Watson AI Lab, IBM Research. |
| Pseudocode | No | The paper describes mathematical formulations and methods but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | Our codes are available at https://github.com/Akhilan B/ Proper-Interpretability |
| Open Datasets | Yes | We evaluate networks trained on the MNIST and CIFAR-10 datasets, and a Restricted Image Net (R-Image Net) dataset used in (Tsipras et al., 2019). |
| Dataset Splits | No | The paper uses standard datasets like MNIST, CIFAR-10, and Restricted Image Net but does not explicitly provide specific percentages, sample counts, or detailed methodologies for their train/validation/test splits. While it mentions evaluating on '200 random test set points', this does not define the full data partitioning for reproduction. |
| Hardware Specification | Yes | Training times are evaluated on a 2.60 GHz Intel Xeon CPU. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | Unless specified otherwise, we choose the perturbation size ϵ = 0.3 on MNIST, 8/255 on CIFAR and 0.003 for RImage Net for robust training under an ℓ perturbation norm. Also, we set the regularization parameter γ as 0.01 in (8); see a justification in Appendix F. |