Explanations can be manipulated and geometry is to blame

Authors: Ann-Kathrin Dombrowski, Maximillian Alber, Christopher Anders, Marcel Ackermann, Klaus-Robert Müller, Pan Kessel

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose an algorithm which allows to manipulate an image with a hardly perceptible perturbation such that the explanation matches an arbitrary target map. We demonstrate its effectiveness for six different explanation methods and on four network architectures as well as two datasets. We provide a theoretical understanding of this phenomenon for gradient-based methods in terms of differential geometry. We demonstrate experimentally that smoothing leads to increased robustness not only for gradient but also for propagation-based methods.
Researcher Affiliation Academia Ann-Kathrin Dombrowski1, Maximilian Alber5, Christopher J. Anders1, Marcel Ackermann2, Klaus-Robert Müller1,3,4, Pan Kessel1 1Machine Learning Group, Technische Universität Berlin, Germany 2Department of Video Coding & Analytics, Fraunhofer Heinrich-Hertz-Institute, Berlin, Germany 3Max-Planck-Institut für Informatik, Saarbrücken, Germany 4Department of Brain and Cognitive Engineering, Korea University, Seoul, Korea 5Charité Berlin, Berlin, Germany
Pseudocode No The paper describes the optimization process and the components of the loss function, but it does not present a formal pseudocode block or algorithm box.
Open Source Code Yes We have uploaded the results of all runs so that interested readers can assess their similarity themselves3 and provide code4 to reproduce them. 4https://github.com/pankessel/adv_explanation_ref
Open Datasets Yes We use a pre-trained VGG-16 network [29] and the Image Net dataset [30]. Moreover, we also successfully tested our algorithm on the CIFAR-10 dataset [34].
Dataset Splits Yes In each step, the network s performance is evaluated on the complete Image Net validation set.
Hardware Specification No The paper does not specify the exact hardware (e.g., GPU model, CPU, memory) used for running the experiments.
Software Dependencies No The paper mentions network architectures (VGG-16, ResNet-18, AlexNet, Densenet-121) and non-linearities (relu, softplus), but it does not provide specific software versions for libraries, frameworks, or programming languages (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x).
Experiment Setup Yes We obtain such manipulations by optimizing the loss function L = h(xadv) ht 2 + γ g(xadv) g(x) 2 , (4) with respect to xadv using gradient descent. We clamp xadv after each iteration so that it is a valid image. The relative weighting of these two summands is controlled by the hyperparameter γ R+. ... using a few hundred iterations of gradient descent. ... The precise value of β is a hyperparameter of the method, but we find that a value around one works well in practice.