Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

InduCE: Inductive Counterfactual Explanations for Graph Neural Networks

Authors: Samidha Verma, Burouj Armgaan, Sourav Medya, Sayan Ranu

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on graph datasets, we show that incorporating edge additions, and modelling marginal effect of perturbations aid in generating better counterfactuals among available recourse. Furthermore, inductive modeling enables Indu CE to directly predict counterfactual perturbations without requiring instance-specific training. This leads to significant computational speed-up over baselines and allows counterfactual analyses for Gnns at scale. In this section, we benchmark Indu CE against established baselines. The anonymized code base and datasets used in our evaluation are submitted with supplementary material.
Researcher Affiliation	Academia	Samidha Verma EMAIL Indian Institute of Technology, Delhi, India Burouj Armgaan EMAIL Indian Institute of Technology, Delhi, India Sourav Medya EMAIL University of Illinois, Chicago, USA Sayan Ranu EMAIL Indian Institute of Technology, Delhi, India
Pseudocode	Yes	The pseudocode of the training pipeline is provided in Alg.2 (Refer to App. A). Alg. 1 presents the pseudocode of the inference pipeline.
Open Source Code	Yes	The anonymized code base and datasets used in our evaluation are submitted with supplementary material.
Open Datasets	Yes	Benchmark Datasets: We use the same three benchmark graph datasets used in Tan et al. (2022); Lin et al. (2021); Lucic et al. (2022). Statistics of these datasets are listed in Table 2. Each dataset has an undirected base graph with pre-defined motifs attached to random nodes of the base graph, and randomly added additional edges to the overall graph. Real Dataset: We additionally use real-world datasets from the Amazon-photos co-purchase network Shchur et al. (2018) and ogbn-arxiv Wang et al. (2020).
Dataset Splits	Yes	For Indu CE and Gem, we use a train/evaluation split of 80/20 on the benchmark, graph and the Amazon-Photos datasets. For ogbn-arxiv, we use the standard splits provided in the ogb package. In our experiments, we use a scaled-down version of the Amazon-Photos dataset. We choose one random node as the central node and took its 3 hop neighbourhood in our dataset. Amazon Photos has an average degree of 13, hence, the 3 hop neighborhood covers a reasonable distribution of class labels. We split the nodes of this subgraph in the ratio of 80 : 20 for train and test sets.
Hardware Specification	Yes	All reported experiments are conducted on an NVIDIA DGX Station with four V100 GPU cards having 128GB GPU memory, 256GB RAM, and a 20 core Intel Xeon E5-2698 v4 2.2 Ghz CPU running in Ubuntu 18.04.
Software Dependencies	No	All reported experiments are conducted on an NVIDIA DGX Station with four V100 GPU cards having 128GB GPU memory, 256GB RAM, and a 20 core Intel Xeon E5-2698 v4 2.2 Ghz CPU running in Ubuntu 18.04. We use Py Torch-geometric s Fey & Lenssen (2019) standard GCNConv layers Kipf & Welling (2016) that are compatible with sparse adjacency matrices to scale the black-box Gnn to the million-sized graph.
Experiment Setup	Yes	The learning rate is 0.01. We use h = 4 because extracting the 4-hop neighbourhood as the subgraph ensured that we preserve the black-box model s accuracy. We use β = 0.5 so as to give equal weight to the predict loss and distance loss (see Eq. 10). We use different values of γ {0.4, 0.6} and find the best performance at γ = 0.4 for the inductive setting and γ = 0.6 for the transductive setting with a maximum perturbation budget δ = 15. We use maximum number of episodes M = 80, 500, 500 for BA-shapes, Tree-cycles and Treegrid respectively. We use GAT as the GNN of choice for the policy network. For the policy network, we use 3 GAT layers, 2 fully connected MLP layers, 16 hidden dimension, a learning rate of .0003 and Leaky Re LU with negative slope 0.1 as the activation function. We use three different values for η {0.1, 0.01, 0.001} and η = 0.1 improves the performance due to higher weight for exploration.