Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Geometric Logit Decoupling for Energy-Based Graph Out-of-distribution Detection

Authors: Min Wang, Hao Yang, Qing Cheng, Jincai Huang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Geo Energy consistently improves OOD detection performance and confidence reliability across various benchmarks and distribution shifts. ... We conduct graph OOD detection experiments under two settings to thoroughly evaluate our approach. ... We assess OOD detection and ID accuracy using key metrics: ID Accuracy, AUROC, AUPR, and FPR95.
Researcher Affiliation	Academia	Min Wang, Hao Yang, Qing Cheng , Jincai Huang National University of Defense Technology EMAIL; EMAIL
Pseudocode	No	The paper describes the methods and formulations mathematically and textually (e.g., equations 1-11 and descriptions in sections 2 and 3), but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	We release anonymized code and data preprocessing scripts at the URL in the supplemental; detailed README guides reproduction of all figures.
Open Datasets	Yes	Following recent work on graph OOD detection [13, 16, 15], our experiments use five benchmark datasets to reflect real-world scenarios with OOD instances. These datasets cover two scenarios: ... (1) Cora, Citeseer, Pubmed [1]: OOD data is synthetically generated... (2) ogbn-Arxiv [17]: This large citation dataset... Twitch-Explicit [18]: It derived from the Twitch streaming platform...
Dataset Splits	Yes	ID data is split into training, validation, and testing sets in a 1:1:8 ratio. ... For Cora, we split the ID data following the semi-supervised learning setting by [1]. ... ID data is randomly split into training, validation, and testing sets in a 1:1:8 ratio. ... For semi-supervised node classification, we set the label rate L/C {20, 40, 60}.
Hardware Specification	Yes	Appendix D reports use of NVIDIA V100 GPUs (4 ), total training time ( 48 GPU hours), and memory footprints per dataset.
Software Dependencies	No	The paper does not explicitly state specific software versions (e.g., Python, PyTorch, CUDA versions) in the provided text sections. While it mentions Appendix C might contain YAML config files, the content for specific versions is not visible in the provided text.
Experiment Setup	Yes	For semi-supervised node classification, we set the label rate L/C {20, 40, 60}. ... Table 10: Optimal hyperparameter selection for the scaling factor s in GNNSafe(+Geo Energy) and GNNSafe++(+Geo Energy) across multiple datasets and OOD types. ... Table 12: Summary of used parameters on model calibration in GCN, GAT and Graph SAGE.