How Interpretable Are Interpretable Graph Neural Networks?

Authors: Yongqiang Chen, Yatao Bian, Bo Han, James Cheng

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we present a theoretical framework that formulates interpretable subgraph learning with the multilinear extension of the subgraph distribution, coined as subgraph multilinear extension (Sub MT). Extracting the desired interpretable subgraph requires an accurate approximation of Sub MT, yet we find that the existing XGNNs can have a huge gap in fitting Sub MT. Consequently, the Sub MT approximation failure will lead to the degenerated interpretability of the extracted subgraphs. To mitigate the issue, we design a new XGNN architecture called Graph Multilinear ne T (GMT), which is provably more powerful in approximating Sub MT. We empirically validate our theoretical findings on a number of graph classification benchmarks. The results demonstrate that GMT outperforms the state-of-the-art up to 10% in terms of both interpretability and generalizability across 12 regular and geometric graph benchmarks.
Researcher Affiliation Collaboration Yongqiang Chen * 2 Yatao Bian 1 Bo Han 3 James Cheng 2 *Work done during an internship at Tencent AI Lab. 1Tencent AI Lab. 2The Chinese University of Hong Kong. 3Hong Kong Bapist University.
Pseudocode Yes Algorithm 1 Practical estimation of counterfactual fidelity. Algorithm 2 Subgraph extractor training algorithm of Graph Multilinear ne T (GMT). Algorithm 3 Subgraph classifier training algorithm of Graph Multilinear ne T (GMT).
Open Source Code Yes Our code is available at https://github.com/LFhase/GMT.
Open Datasets Yes We consider both the regular and geometric graph classification benchmarks following the XGNN literature (Miao et al., 2022; 2023). For regular graphs, we include BA-2MOTIFS (Luo et al., 2020), MUTAG (Debnath et al., 1991), MNIST-75SP (Knyazev et al., 2019), which are widely evaluated by post-hoc explanation approaches (Yuan et al., 2020b), as well as SPURIOUSMOTIF (Wu et al., 2022b), GRAPH-SST2 (Socher et al., 2013; Yuan et al., 2020b) and OGBG-MOLHIV (Hu et al., 2020) where there exist various graph distribution shifts. For geometric graphs, we consider ACTSTRACK, TAU3MU, SYNMOL and PLBIND curated by Miao et al. (2023).
Dataset Splits Yes Dataset splits. We follow previous works (Luo et al., 2020; Miao et al., 2022) to split BA-2Motifs randomly into three sets as (80%/10%/10%), Mutag randomly into 80%/20% as train and validation sets where the test data are the mutagen molecules with -NO2 or -NH2. We use the default split for MNIST-75sp given by (Knyazev et al., 2019) with a smaller sampling size following (Miao et al., 2022). We use the default splits for Graph-SST2 (Yuan et al., 2020b), Spurious-Motifs (Wu et al., 2022b) and OGBG-Molhiv (Hu et al., 2020) datasets. For geometric datasets, we use the author provided default splits.
Hardware Specification Yes We ran our experiments on Linux Servers installed with V100 graphics cards and CUDA 11.3.
Software Dependencies Yes We implement our methods with Py Torch (Paszke et al., 2019) and Py Torch Geometric (Fey & Lenssen, 2019) 2.0.4. We ran our experiments on Linux Servers installed with V100 graphics cards and CUDA 11.3.
Experiment Setup Yes We tune the hyperparameters as recommended by previous works. More details are given in Appendix F.2. In Appendix F.2: We search for the hyperparameters of r from [r0 ± 0.1, r0, r0 + 0.1]... We search the weights of graph information regularizers from [0.1, 0.5, 1, 2] for regular graphs and from [0.01, 0.1, 1] for geometric datasets... We search for the sampling rounds from [1, 20, 40, 80, 100, 200] when the memory allows.