Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Graph Data Selection for Domain Adaptation: A Model-Free Approach

Authors: Ting-Wei Li, Ruizhong Qiu, Hanghang Tong

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through comprehensive empirical studies on several real-world graph-level datasets and multiple covariate shift types, we demonstrate that GRADATE outperforms existing selection methods and enhances off-the-shelf GDA methods with much fewer training data.
Researcher Affiliation	Academia	Ting-Wei Li University of Illinois Urbana-Champaign, IL USA EMAIL Ruizhong Qiu University of Illinois Urbana-Champaign, IL USA EMAIL Hanghang Tong University of Illinois Urbana-Champaign, IL USA EMAIL
Pseudocode	Yes	We summarize the overall procedure of Linear FGW in Algorithm 1. Appendix B goes through the steps to compute Graph Dataset Distance (GDD). The entire procedure is included in Algorithm 2. Appendix C summarizes the submodule GREAT used in our main algorithm (Algorithm 3). Appendix D summarizes our main algorithm GRADATE (Algorithm 4).
Open Source Code	No	We will provide the code package during submission and make the code available upon acceptance.
Open Datasets	Yes	Datasets and Graph Domains. We consider graph classification tasks conducted on six real-world graph-level datasets, including IMDB-BINARY [69], IMDB-MULTI [69], MSRC_21 [45], ogbg-molbace [22], ogbg-molbbbp [22] and ogbg-molhiv [22]. The former three datasets are from the TUDataset [44]; while the latter three datasets are from the OGB benchmark [22]. ... In Table 12, we provide details of datasets used in this work. For # NODES and # EDGES, we report the mean sizes across all graphs in the dataset.
Dataset Splits	Yes	Specifically, graphs are sorted by corresponding properties in an ascending order and split into train/val/test sets with ratios 60%/20%/20%.
Hardware Specification	Yes	The computation is performed on Linux with an NVIDIA Tesla V100-SXM2-32GB GPU.
Software Dependencies	No	We perform all our methods in Python and GNN models are built-in modules of PyTorch Geometric [14].
Experiment Setup	Yes	The learning rate is set to 10^-2 with weight decay 5e-4. We train 200 epochs for datasets IMDB-BINARY, IMDB-MULTI, MSRC_21 and 100 epochs for datasets ogbg-molbace, ogbg-molbbbp, ogbg-molhiv with early stopping, evaluating the test set on the model checkpoint that achieves the highest validation performance during training. For each combination of data and model, we report the mean and standard deviation of classification performance over 3-5 random trials. For TUDatasets, we use accuracy as the performance metric; for OGB datasets, we use AUCROC as the performance metric.