Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Quantifying Distributional Invariance in Causal Subgraph for IRM-Free Graph Generalization

Authors: Yang Qiu, Yixiong Zou, Jun Wang, Wei Liu, Xiangyu Fu, Ruixuan Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on two widely used benchmarks demonstrate that our method consistently outperforms state-of-the-art methods in graph generalization.
Researcher Affiliation	Collaboration	1School of Computer Science and Technology, Huazhong University of Science and Technology, 2i Wudao Tech
Pseudocode	No	The paper describes the methodology in Section 3 and illustrates the framework in Figure 4, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https: //github.com/anders1123/IDG.
Open Datasets	Yes	We adopt two widely used benchmarks for graph OOD generalization Graph OOD [8] and Drug OOD [15], across seven datasets: Motif, CMNIST, HIV, SST2, and Twitter from Graph OOD, and EC50 and IC50 from Drug OOD.
Dataset Splits	Yes	Each dataset contains one or more domains and is divided into domain-based splits, thereby introducing distribution shifts. ... As in prior work, we partition each dataset by its domain attribute to induce distribution shifts. For example, in the Motif basis-shift setting, the motif types in the test set are entirely disjoint from those in the training and validation sets, thus rigorously assessing model generalization.
Hardware Specification	Yes	Experiments in this paper are conducted on NVIDIA RTX3090 GPUs.
Software Dependencies	No	The paper mentions employing GIN as the backbone and describes optimization details, but it does not specify version numbers for any software components (e.g., Python, PyTorch, CUDA, GIN library version).
Experiment Setup	Yes	Following [9], we employ GIN for both the extractor and predictor, set (λ1, λ2) = (0.1, 0.01), and retain the original learning-rate and batch-size settings.