Relevant Walk Search for Explaining Graph Neural Networks

Authors: Ping Xiong, Thomas Schnake, Michael Gastegger, Grégoire Montavon, Klaus Robert Muller, Shinichi Nakajima

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate the performance of our algorithms at scale and their utility across application domains, i.e., on epidemiology, molecular, and natural language benchmarks. We provide our codes under github.com/xiong-ping/rel walk gnnlrp.
Researcher Affiliation Collaboration 1Technische Universit at Berlin (TU Berlin) 2BIFOLD Berlin Institute for the Foundations of Learning and Data 3Freie Universit at Berlin (FU Berlin) 4Department of Artificial Intelligence, Korea University, Seoul 136-713, Korea 5Max Planck Institut f ur Informatik, 66123 Saarbr ucken, Germany 6Google Research, Brain team, Berlin 7RIKEN Center for AIP, Japan.
Pseudocode Yes Algorithm 1 Find the highest absolute relevant neuron-level walk (EMP-neu-Basic). Algorithm 2 Search space splitting for finding the tope K most relevant neuron-level walks. Algorithm 3 Find the most relevant node-level walk approximately by averaging (AMP-ave-Basic). Algorithm 4 Search space splitting for approximately finding the tope K most relevant node-level walks.
Open Source Code Yes We provide our codes under github.com/xiong-ping/rel walk gnnlrp.
Open Datasets Yes We use common benchmark datasets including BA-2motif (Luo et al., 2020), MUTAG (Debnath et al., 1991), Mutagenicity (Kazius et al., 2005), and Graph-SST2 (Yuan et al., 2020b).
Dataset Splits Yes We trained the model with a set of 400 positive and 400 negative samples, and use the rest as test set. (BA-2motif, Appendix F.1). The train set consists of 108 samples with half positive and half negative, and we use the rest samples as the test set. (MUTAG, Appendix F.2). The train set has 3096 samples with half positive and half negative, and the rest are used as test set. (Mutagenicity, Appendix F.3). We downloaded the dataset from Yuan et al. (2020b) and used their dataset split. (Graph-SST2, Appendix F.4). We generated 100 samples (scenarios)... and trained a L-layered GCN with 80 samples and tested on the other 20 samples. (Infection, Section 4.1)
Hardware Specification Yes Table 1 shows computation time (on an M1Pro CPU) of explanation methods on the BA-2motif and Infection datasets.
Software Dependencies No The paper mentions optimizers (SGD, Adam) and model types (GIN, GCN) but does not provide specific version numbers for software libraries or frameworks (e.g., PyTorch, TensorFlow) used in the experiments.
Experiment Setup Yes We trained a GIN model with 3 layers, with a 2-layer multi-layer perceptron as the combine function in every GIN block. The activation function we use throughout the model is Re LU. The nodes initial embedding is single value 1...We trained the model with the SGD optimizer with a decreasing learning rate γ = 0.00001/(1.0 + (epoch/epochs) for 5000 epochs. (Appendix F.1)