GraphTrail: Translating GNN Predictions into Human-Interpretable Logical Rules

Authors: Burouj Armgaan, Manthan Dalmia, Sourav Medya, Sayan Ranu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across diverse datasets and GNN architectures demonstrate significant improvement over existing global explainers in mapping GNN predictions to faithful logical formulae.
Researcher Affiliation Academia Burouj Armgaan, Manthan Dalmia Department of Computer Science & Engineering IIT Delhi, India csz228001@iitd.ac.in, manthandalmia2@gmail.com Sourav Medya Department of Computer Science University of Illinois, Chicago, USA medya@uic.edu Sayan Ranu Department of Computer Science & Engineering and Yardi School of AI IIT Delhi, India sayanranu@cse.iitd.ac.in
Pseudocode No The paper illustrates its pipeline in Figure 1 and describes algorithmic steps in text (e.g., in sections 3.3 and 3.4, and Appendices B and C), but it does not include any formally labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes The codebase of GRAPHTRAIL is shared at https://github.com/idea-iitd/GraphTrail.
Open Datasets Yes We use four benchmark datasets listed in Table C (App. D). While NCI1 [49], MUTAG [17], and Mutagenicity [39, 20] are collections of molecules, BAMulti Shapes [55] is a synthetic dataset...
Dataset Splits Yes Each dataset is split into train-validation-test sets in the proportion of 70:10:20.
Hardware Specification Yes All experiments are performed on an Intel Xeon Gold 6248 processor with 96 cores and 1 NVIDIA A100 GPU with 40GB of memory and 377 GB RAM with Ubuntu 18.04.
Software Dependencies No The paper mentions using specific software packages like 'Pytorch Geometric' and refers to a symbolic regression library [8] and the 'Adam optimizer', but it does not provide specific version numbers for these software dependencies (e.g., PyTorch 1.x, Pysr 0.x).
Experiment Setup Yes While we benchmark against various GNN architectures and POOL layers (Eq. 4), the default architecture is set to GAT for MUTAG and Mutagenicity and GIN for the other two. We use SUMPOOL as the default across datasets. ... All GNNs have been trained with L = 3 layers. We use Adam optimizer with a learning rate set to 0.001. Training stops early after a warm-up of 90 epochs if validation accuracy doesn t increase for 100 epochs or a total of 1000 epochs elapse.