Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Quantifying Distributional Invariance in Causal Subgraph for IRM-Free Graph Generalization

Authors: Yang Qiu, Yixiong Zou, Jun Wang, Wei Liu, Xiangyu Fu, Ruixuan Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on two widely used benchmarks demonstrate that our method consistently outperforms state-of-the-art methods in graph generalization.
Researcher Affiliation Collaboration 1School of Computer Science and Technology, Huazhong University of Science and Technology, 2i Wudao Tech
Pseudocode No The paper describes the methodology in Section 3 and illustrates the framework in Figure 4, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https: //github.com/anders1123/IDG.
Open Datasets Yes We adopt two widely used benchmarks for graph OOD generalization Graph OOD [8] and Drug OOD [15], across seven datasets: Motif, CMNIST, HIV, SST2, and Twitter from Graph OOD, and EC50 and IC50 from Drug OOD.
Dataset Splits Yes Each dataset contains one or more domains and is divided into domain-based splits, thereby introducing distribution shifts. ... As in prior work, we partition each dataset by its domain attribute to induce distribution shifts. For example, in the Motif basis-shift setting, the motif types in the test set are entirely disjoint from those in the training and validation sets, thus rigorously assessing model generalization.
Hardware Specification Yes Experiments in this paper are conducted on NVIDIA RTX3090 GPUs.
Software Dependencies No The paper mentions employing GIN as the backbone and describes optimization details, but it does not specify version numbers for any software components (e.g., Python, PyTorch, CUDA, GIN library version).
Experiment Setup Yes Following [9], we employ GIN for both the extractor and predictor, set (λ1, λ2) = (0.1, 0.01), and retain the original learning-rate and batch-size settings.