Explanations can be manipulated and geometry is to blame
Authors: Ann-Kathrin Dombrowski, Maximillian Alber, Christopher Anders, Marcel Ackermann, Klaus-Robert Müller, Pan Kessel
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose an algorithm which allows to manipulate an image with a hardly perceptible perturbation such that the explanation matches an arbitrary target map. We demonstrate its effectiveness for six different explanation methods and on four network architectures as well as two datasets. We provide a theoretical understanding of this phenomenon for gradient-based methods in terms of differential geometry. We demonstrate experimentally that smoothing leads to increased robustness not only for gradient but also for propagation-based methods. |
| Researcher Affiliation | Academia | Ann-Kathrin Dombrowski1, Maximilian Alber5, Christopher J. Anders1, Marcel Ackermann2, Klaus-Robert Müller1,3,4, Pan Kessel1 1Machine Learning Group, Technische Universität Berlin, Germany 2Department of Video Coding & Analytics, Fraunhofer Heinrich-Hertz-Institute, Berlin, Germany 3Max-Planck-Institut für Informatik, Saarbrücken, Germany 4Department of Brain and Cognitive Engineering, Korea University, Seoul, Korea 5Charité Berlin, Berlin, Germany |
| Pseudocode | No | The paper describes the optimization process and the components of the loss function, but it does not present a formal pseudocode block or algorithm box. |
| Open Source Code | Yes | We have uploaded the results of all runs so that interested readers can assess their similarity themselves3 and provide code4 to reproduce them. 4https://github.com/pankessel/adv_explanation_ref |
| Open Datasets | Yes | We use a pre-trained VGG-16 network [29] and the Image Net dataset [30]. Moreover, we also successfully tested our algorithm on the CIFAR-10 dataset [34]. |
| Dataset Splits | Yes | In each step, the network s performance is evaluated on the complete Image Net validation set. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU model, CPU, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions network architectures (VGG-16, ResNet-18, AlexNet, Densenet-121) and non-linearities (relu, softplus), but it does not provide specific software versions for libraries, frameworks, or programming languages (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x). |
| Experiment Setup | Yes | We obtain such manipulations by optimizing the loss function L = h(xadv) ht 2 + γ g(xadv) g(x) 2 , (4) with respect to xadv using gradient descent. We clamp xadv after each iteration so that it is a valid image. The relative weighting of these two summands is controlled by the hyperparameter γ R+. ... using a few hundred iterations of gradient descent. ... The precise value of β is a hyperparameter of the method, but we find that a value around one works well in practice. |