Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DuetGraph: Coarse-to-Fine Knowledge Graph Reasoning with Dual-Pathway Global-Local Fusion

Authors: Jin Li, Zezhong Ding, Xike Xie

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on various datasets demonstrate that Duet Graph achieves state-of-the-art (SOTA) performance, with up to an 8.7% improvement in reasoning quality and a 1.8 acceleration in training efficiency. Our code is available at https://github.com/USTC-Data Darkness Lab/Duet Graph.git.
Researcher Affiliation	Academia	1School of Biomedical Engineering, University of Science and Technology of China (USTC) 2School of Artificial Intelligence and Data Science, USTC 3Data Darkness Lab, Suzhou Institute for Advanced Research, USTC EMAIL, EMAIL
Pseudocode	No	The paper describes the model architecture and steps in Sections 3.1 and 3.2, and visually in Figure 2. However, it does not include a figure, block, or section explicitly labeled "Pseudocode" or "Algorithm", nor are the structured steps formatted like code or an algorithm.
Open Source Code	Yes	Our code is available at https://github.com/USTC-Data Darkness Lab/Duet Graph.git.
Open Datasets	Yes	Inductive Datasets. For inductive reasoning, following Liu et al. [19], we use the same data divisions of FB15k-237 [47], WN18RR [48], and NELL-995 [49]. Transductive Datasets. For transductive reasoning, we conduct experiments on four widely utilized KG reasoning datasets: FB15k-237 [47], WN18RR [48], NELL-995 [49], and YAGO3-10 [50], adopting the standard data splits provided by prior works [28, 51]. Triple Classification Datasets. For the triple classification task, we conduct experiments on three widely used knowledge graph datasets: UMLS[52], FB13[53] and WN11[53].
Dataset Splits	Yes	For inductive reasoning, following Liu et al. [19], we use the same data divisions of FB15k-237 [47], WN18RR [48], and NELL-995 [49]. Each division consists of 4 versions, resulting in 12 subsets in total. For transductive reasoning, we conduct experiments on four widely utilized KG reasoning datasets: FB15k-237 [47], WN18RR [48], NELL-995 [49], and YAGO3-10 [50], adopting the standard data splits provided by prior works [28, 51]. Appendix D.3 Dataset Statistics contains Table 9 and Table 10, which provide detailed breakdowns of train, valid, and test triplets for all datasets.
Hardware Specification	Yes	The experiments are conducted using Python 3.9.21, Py Torch 2.6.0, and CUDA 12.1, with an NVIDIA A100 80GB GPU.
Software Dependencies	Yes	The experiments are conducted using Python 3.9.21, Py Torch 2.6.0, and CUDA 12.1, with an NVIDIA A100 80GB GPU.
Experiment Setup	Yes	D.4 Hyperparameters Setup. Coarse-to-Fine reasoning model. In the coarse-grained reasoning stage, we directly adopt existing models without any modifications to their original hyperparameter settings. Dual-Pathway fusion model. For each dataset, we perform hyperparameter tuning on the validation set. We conduct grid search over the following hyperparameters: Learning rate: {10-4, 510-4, 10-3, 510-3, 10-2} Weight decay: {10-5, 10-4} Hidden dimension: {16, 32, 64, 128} Negative sampling size: {128, 256, 512} Message passing layers in input encoder: {1, 2, 3} Message passing layers in local pathway: {1, 2, 3} Transformer layers in global pathway: {1, 2, 3}