reproducibilityindex.ai

Proper Network Interpretability Helps Adversarial Robustness in Classification

Authors: Akhilan Boopathy, Sijia Liu, Gaoyuan Zhang, Cynthia Liu, Pin-Yu Chen, Shiyu Chang, Luca Daniel

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we theoretically show that with a proper measurement of interpretation, it is actually difﬁcult to prevent prediction-evasion adversarial attacks from causing interpretation discrepancy, as conﬁrmed by experiments on MNIST, CIFAR-10 and Restricted Image Net. We empirically show that interpretability alone can be used to defend adversarial attacks for both misclassifcation and misinterpretation.
Researcher Affiliation	Collaboration	1Massachusetts Institute of Technology 2MIT-IBM Watson AI Lab, IBM Research.
Pseudocode	No	The paper describes mathematical formulations and methods but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	Yes	Our codes are available at https://github.com/Akhilan B/ Proper-Interpretability
Open Datasets	Yes	We evaluate networks trained on the MNIST and CIFAR-10 datasets, and a Restricted Image Net (R-Image Net) dataset used in (Tsipras et al., 2019).
Dataset Splits	No	The paper uses standard datasets like MNIST, CIFAR-10, and Restricted Image Net but does not explicitly provide specific percentages, sample counts, or detailed methodologies for their train/validation/test splits. While it mentions evaluating on '200 random test set points', this does not define the full data partitioning for reproduction.
Hardware Specification	Yes	Training times are evaluated on a 2.60 GHz Intel Xeon CPU.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup	Yes	Unless speciﬁed otherwise, we choose the perturbation size ϵ = 0.3 on MNIST, 8/255 on CIFAR and 0.003 for RImage Net for robust training under an ℓ perturbation norm. Also, we set the regularization parameter γ as 0.01 in (8); see a justiﬁcation in Appendix F.