Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GnnXemplar: Exemplars to Explanations - Natural Language Rules for Global GNN Interpretability

Authors: Burouj Armgaan, Eshan Jain, Harsh Pandey, Mahesh Chandran, Sayan Ranu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across diverse benchmarks show that GNNXEMPLAR significantly outperforms existing methods in fidelity, scalability, and human interpretability, as validated by a user study with 60 participants.
Researcher Affiliation	Collaboration	Burouj Armgaan Dept. of CSE, IIT Delhi EMAIL Eshan Jain Dept. of CSE, IIT Delhi EMAIL Harsh Pandey Fujitsu Research of India, Bangalore EMAIL Mahesh Chandran Fujitsu Research of India, Bangalore EMAIL Sayan Ranu Dept. of CSE and Yardi Sc AI, IIT Delhi EMAIL
Pseudocode	Yes	Algorithm 1 Greedy Node Selection Require: Graph G, budget b, Rev-k-NN sets of nodes Ensure: exemplar set A of size b 1: A 2: while \|A\| < b do 3: v arg maxv Vtr\A Π(A {v}) 4: A A {v } 5: Return A
Open Source Code	Yes	Our codebase is shared at https://github.com/idea-iitd/GnnXemplar.git.
Open Datasets	Yes	Datasets: Table 1 presents the 8 benchmark datasets we use. Wherever available, we adopt the standard train/validation/test splits from Py Torch Geometric or the original data releases, preserving class balance. We train a GAT for TAGCora and GCN for the rest.
Dataset Splits	Yes	Wherever available, we adopt the standard train/validation/test splits from Py Torch Geometric or the original data releases, preserving class balance.
Hardware Specification	Yes	All experiments were performed on Intel(R) Xeon(R) Gold 6426Y: 64 cores, 126 GB RAM, 2 NVIDIA-L40S GPUs, 45 Gi B each running on Ubuntu 20.04.6 LTS.
Software Dependencies	No	All experiments were performed on Intel(R) Xeon(R) Gold 6426Y: 64 cores, 126 GB RAM, 2 NVIDIA-L40S GPUs, 45 Gi B each running on Ubuntu 20.04.6 LTS. Then for sec. 3.4 we used gemini-1.5-flash LLM from google s generativeai package. For training all the GNNs we use the standard training pipeline of pytorch which resembles closely to what all the above models use in their implementation. The text does not provide specific version numbers for software libraries like PyTorch or generativeai package.
Experiment Setup	Yes	We train all the models for 250 epochs and choose the learning rates from the set {0.005, 0.01} with Adam optimizer and a weight decay of 5e 4. In sec. 3.2 we fist picked k for k-NN and Reverse k-NN, for all the datasets in experiment we choose k = 5. Next in sec. 3.3 for coverage maximization instead of selecting number of points we set the stopping criterion based on coverage of points, when the coverage becomes 95%. Then for sampling positive and negative points which will be used in self-refinement we set the sample size = 50 each. We split these samples in training and validation set in the ratio of 6 : 4 respectively. Finally for stopping criteria of self-refinement process we stop the iteration when either the accuracy 95 or if the number of iterations reaches 5.