Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GD$^2$: Robust Graph Learning under Label Noise via Dual-View Prediction Discrepancy

Authors: Kailai Li, Jiong Lou, Jiawei Sun, Honghong Zeng, Wen Li, Chentao Wu, Yuan Luo, Wei Zhao, shouguo du, Jie LI

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple datasets and noise settings demonstrate that GD2 achieves superior performance over state-of-the-art baselines. [...] Table 1: Node classification accuracy (%, mean std) on noisy datasets. [...] Table 2: Ablation of GD2 on the Computer and Roman-Empire datasets under the Pair-0.4 noise. [...] Figures 3 and 4 show hyperparameter sensitivity tests.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Shenzhen University of Advanced Technology 3Yancheng Blockchain Research Institute 4Shanghai Key Laboratory of Trusted Data Circulation and Governance and Web3 5Shanghai University of International Business and Economics 6Shanghai Municipal Big Data Center EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	The pseudo code of GD2 is presented in the Appendix E. [...] Algorithm 1 Pseudo code of GD2
Open Source Code	No	We have provided open access to the datasets and detailed experimental settings for reproducing results in the Appendix B. The code will be released after the paper has been accepted.
Open Datasets	Yes	We conduct experiments on six benchmark datasets, including four homophilous graphs: Computer, Photo, CS [29], and Wiki CS [25], and two heterophilous graphs: Roman-Empire, and Amazon-Ratings [26]. Dataset statistics are provided in Appendix A.
Dataset Splits	Yes	Following previous works [12, 48], we randomly select 10% of the nodes for training, 10% for validation, and use the remaining nodes for testing.
Hardware Specification	Yes	All experiments are conducted on a Linux server with a NVIDIA Ge Force RTX 4090 GPU with 24GB memory.
Software Dependencies	Yes	We implement the proposed GD2 with Py Torch 2.4.0 and torch_geometric 2.6.1.
Experiment Setup	Yes	Hyperparameters. We select the hyperparameters through a grid search, with the search ranges detailed in Table 4. All hyperparameters are tuned using validation sets. For baseline methods, hyperparameter tuning is conducted within the ranges recommended in their original publications. [...] Table 4: Search ranges for hyperparameters.