reproducibilityindex.ai

Interpretation of Neural Networks Is Fragile

Authors: Amirata Ghorbani, Abubakar Abid, James Zou3681-3688

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We systematically characterize the robustness of interpretations generated by several widely-used feature importance interpretation methods (feature importance maps, integrated gradients, and Deep LIFT) on Image Net and CIFAR-10. In all cases, our experiments show that systematic perturbations can lead to dramatically different interpretations without changing the label. We extend these results to show that interpretations based on exemplars (e.g. inﬂuence functions) are similarly susceptible to adversarial attack.
Researcher Affiliation	Academia	Amirata Ghorbani, Abubakar Abid, James Zou Stanford University 450 Serra Mall, Stanford, CA, USA {amiratag, a12d, jamesz}@stanford.edu
Pseudocode	Yes	Algorithm 1 Iterative feature importance Attacks
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for its described methodology.
Open Datasets	Yes	Data sets and models For attacks against feature importance interpretation, we used ILSVRC2012 (Image Net classiﬁcation challenge data) (Russakovsky et al. 2015) and CIFAR-10 (Krizhevsky 2009). For the Image Net classiﬁcation data set, we used a pre-trained Squeeze Net model introduced by (Iandola et al. 2016). For both data sets, the results are examined on feature importance scores obtained by simple gradient, integrated gradients, and Deep LIFT methods. For Deep LIFT, we used the pixel-wise and the channel-wise mean images as the CIFAR10 and Image Net reference points respectively. For the integrated gradients method, the same references were used with parameter M = 100. We ran all iterative attack algorithms for P = 300 iterations with step size α = 0.5. To evaluate our adversarial attack against inﬂuence functions, we followed a similar experimental setup to that of the original authors: we trained an Inception Net v3 with all but the last layer frozen (the weights were pre-trained on Image Net and obtained from Keras). The last layer was trained on a binary ﬂower classiﬁcation task (roses vs. sunﬂowers), using a data set consisting of 1,000 training images2. This data set was chosen because it consisted of images that the network had not seen during pre-training on Image Net. The network achieved a validation accuracy of 97.5%.
Dataset Splits	No	No explicit training/test/validation split percentages or counts are provided for the ImageNet and CIFAR-10 datasets, beyond mentioning they were used for evaluation. For the flower dataset, it states "1,000 training images" and a "validation accuracy of 97.5%", but no specific number or percentage for validation set size.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies	No	The paper mentions 'Keras' but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We ran all iterative attack algorithms for P = 300 iterations with step size α = 0.5. For Deep LIFT, we used the pixel-wise and the channel-wise mean images as the CIFAR10 and Image Net reference points respectively. For the integrated gradients method, the same references were used with parameter M = 100.