Learning how to explain neural networks: PatternNet and PatternAttribution
Authors: Pieter-Jan Kindermans, Kristof T. Schütt, Maximilian Alber, Klaus-Robert Müller, Dumitru Erhan, Been Kim, Sven Dähne
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze the performance of existing explanation approaches in the controlled setting of a linear model (Sections 2 and 3). We propose two novel explanation methods Pattern Net and Pattern Attribution that alleviate shortcomings of current approaches, as discovered during our analysis, and improve explanations in real-world deep neural networks visually and quantitatively (Sections 4 and 5). To evaluate the quality of the explanations, we focus on the task of image classification. We used Theano (Bergstra et al., 2010) and Lasagne (Dieleman et al., 2015) for our implementation. We restrict the analysis to the well-known Image Net dataset (Russakovsky et al., 2015) using the pre-trained VGG-16 model (Simonyan & Zisserman, 2015). |
| Researcher Affiliation | Collaboration | Pieter-Jan Kindermans Google Brain pikinder@google.com Kristof T. Sch utt & Maximilian Alber TU Berlin {kristof.schuett,maximilian.alber}@tu-berlin.de Klaus-Robert M uller TU Berlin klaus-robert.mueller@tu-berlin.de Dumitru Erhan & Been Kim Google Brain {dumitru,beenkim}@google.com Sven D ahne TU Berlin sven.daehne@tu-berlin.de Part of this work was done at TU Berlin, part of the work was part of the Google Brain Residency program. KRM is also with Korea University and Max Planck Institute for Informatics, Saarbr ucken, Germany Sven D ahne is now at Amazon |
| Pseudocode | Yes | A ALGORITHMS In this section we will give an overview of the visualization algorithms to clarify their actual implementation for Re Lu networks. This shows the similarities and the differences between all approaches. For all visualization approaches, the back-projection through a max-pooling layer is only through the path that was active in the forward pass. A.1 FUNCTION VISUALISATION A.1.1 GRADIENT WITH RESPECT TO THE INPUT ... A.2 SIGNAL VISUALIZATION A.2.1 DECONVNET ... A.3 ATTRIBUTION VISUALIZATION A.3.1 DEEP-TAYLOR DECOMPOSITION |
| Open Source Code | No | The paper states that for the comparison to Prediction-Differences analysis, they used "the open-source code provided by the authors" (referring to Zintgraf et al. (2017)), but it does not provide an explicit statement or link for their own implementation's source code. |
| Open Datasets | Yes | We restrict the analysis to the well-known Image Net dataset (Russakovsky et al., 2015) using the pre-trained VGG-16 model (Simonyan & Zisserman, 2015). |
| Dataset Splits | Yes | The signal estimators are trained on the first half of the training dataset. The vector v, used to measure the quality of the signal estimator ρ(x) in Eq. (1), is optimized on the second half of the training dataset. All the results presented here were obtained using the official validation set of 50000 samples. The validation set was not used for training the signal estimators, nor for training the vector v to measure the quality. Consequently our results are obtained on previously unseen data. |
| Hardware Specification | Yes | This was implemented on a NVIDIA Tesla K40 and took about 24 hours per optimized signal estimator. |
| Software Dependencies | No | We used Theano (Bergstra et al., 2010) and Lasagne (Dieleman et al., 2015) for our implementation. We optimize the equivalent least-squares problem using stochastic mini-batch gradient descent with ADAM Kingma & Ba (2015) until convergence. While software packages are mentioned, specific version numbers for Theano, Lasagne, or ADAM are not provided. |
| Experiment Setup | Yes | Images were rescaled and cropped to 224x224 pixels. The signal estimators are trained on the first half of the training dataset. We optimize the equivalent least-squares problem using stochastic mini-batch gradient descent with ADAM Kingma & Ba (2015) until convergence. |