Global Explainability of GNNs via Logic Combination of Learned Concepts

Authors: Steve Azzolin, Antonio Longa, Pietro Barbiero, Pietro Lio, Andrea Passerini

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted an experimental evaluation on synthetic and real-world datasets aimed at answering the following research questions:
Researcher Affiliation Academia 1University of Trento, 2University of Cambridge, 3Fondazione Bruno Kessler
Pseudocode No The paper describes the proposed method step-by-step in narrative text and with mathematical formulas, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' block or figure.
Open Source Code Yes The source code of GLGExplainer, including the extraction of local explanations, as well as the datasets and all the code for reproducing the results is made freely available online1. 1https://github.com/steveazzolin/gnn_logic_global_expl
Open Datasets Yes The source code of GLGExplainer, including the extraction of local explanations, as well as the datasets and all the code for reproducing the results is made freely available online1. 1https://github.com/steveazzolin/gnn_logic_global_expl
Dataset Splits Yes Table 2: Mean and standard deviation for Fidelity, Accuracy, and Concept Purity computed over 5 runs with different random seeds. Since the Concept Purity is computed for every cluster independently, here we report mean and standard deviation across clusters over the best run according to the validation set.
Hardware Specification No The paper describes the model architectures (e.g., 2-layers GIN, 3-layers GCN) and training procedures, but it does not specify any particular hardware components (e.g., GPU models, CPU types, memory) used to run the experiments. It only mentions 'we trained our own networks'.
Software Dependencies No The paper mentions software components like 'PGExplainer', 'GIN', 'GATV2', 'ADAM optimizer', and 'PyTorch implementation'. However, it does not provide specific version numbers for these software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes We set the number of prototypes m to 6, 2, and 4 for BAMulti Shapes, Mutagenicity, and HIN respectively (see Section 4.4 for an analysis showing how these numbers were inferred), keeping the dimensionality d to 10. We trained using ADAM optimizer with early stopping and with a learning rate of 1e-3 for the embedding and prototype learning components and a learning rate of 5e-4 for the E-LEN. The batch size was set to 128, the focusing parameter γ to 2, while the auxiliary loss coefficients λ1 and λ2 were set respectively to 0.09 and 0.00099. The E-LEN consists of an input Entropy Layer (R^m -> R^10), a hidden layer (R^10 -> R^5), and an output layer with Leaky ReLU activation function.