Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Geometric Logit Decoupling for Energy-Based Graph Out-of-distribution Detection
Authors: Min Wang, Hao Yang, Qing Cheng, Jincai Huang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Geo Energy consistently improves OOD detection performance and confidence reliability across various benchmarks and distribution shifts. ... We conduct graph OOD detection experiments under two settings to thoroughly evaluate our approach. ... We assess OOD detection and ID accuracy using key metrics: ID Accuracy, AUROC, AUPR, and FPR95. |
| Researcher Affiliation | Academia | Min Wang, Hao Yang, Qing Cheng , Jincai Huang National University of Defense Technology EMAIL; EMAIL |
| Pseudocode | No | The paper describes the methods and formulations mathematically and textually (e.g., equations 1-11 and descriptions in sections 2 and 3), but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | We release anonymized code and data preprocessing scripts at the URL in the supplemental; detailed README guides reproduction of all figures. |
| Open Datasets | Yes | Following recent work on graph OOD detection [13, 16, 15], our experiments use five benchmark datasets to reflect real-world scenarios with OOD instances. These datasets cover two scenarios: ... (1) Cora, Citeseer, Pubmed [1]: OOD data is synthetically generated... (2) ogbn-Arxiv [17]: This large citation dataset... Twitch-Explicit [18]: It derived from the Twitch streaming platform... |
| Dataset Splits | Yes | ID data is split into training, validation, and testing sets in a 1:1:8 ratio. ... For Cora, we split the ID data following the semi-supervised learning setting by [1]. ... ID data is randomly split into training, validation, and testing sets in a 1:1:8 ratio. ... For semi-supervised node classification, we set the label rate L/C {20, 40, 60}. |
| Hardware Specification | Yes | Appendix D reports use of NVIDIA V100 GPUs (4 ), total training time ( 48 GPU hours), and memory footprints per dataset. |
| Software Dependencies | No | The paper does not explicitly state specific software versions (e.g., Python, PyTorch, CUDA versions) in the provided text sections. While it mentions Appendix C might contain YAML config files, the content for specific versions is not visible in the provided text. |
| Experiment Setup | Yes | For semi-supervised node classification, we set the label rate L/C {20, 40, 60}. ... Table 10: Optimal hyperparameter selection for the scaling factor s in GNNSafe(+Geo Energy) and GNNSafe++(+Geo Energy) across multiple datasets and OOD types. ... Table 12: Summary of used parameters on model calibration in GCN, GAT and Graph SAGE. |