Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Redundancy-Aware Test-Time Graph Out-of-Distribution Detection

Authors: Yue Hou, He Zhu, Ruomei Liu, Yingke Su, Junran Wu, Ke Xu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on real-world datasets demonstrate the superior performance of Red OUT on OOD detection. Specifically, our method achieves an average improvement of 6.7%, significantly surpassing the best competitor by 17.3% on the Clin Tox/LIPO dataset pair.
Researcher Affiliation	Academia	Yue Hou1,2, He Zhu1, Ruomei Liu1, Yingke Su2, Junran Wu1 , Ke Xu1, 1State Key Laboratory of Complex & Critical Software Environment, Beihang University 2Shen Yuan Honors College, Beihang University
Pseudocode	Yes	Algorithm 1: Coding tree construction with height k via structural entropy minimization. Algorithm 2: Overall optimization process of Red OUT.
Open Source Code	Yes	The code of Red OUT is available at: https://github.com/name-is-what/Red OUT.
Open Datasets	Yes	For OOD detection, we employ 10 pairs of datasets from two mainstream graph data benchmarks (i.e., TUDataset [24] and OGB [11]) following GOOD-D [20]. We also conduct experiments on anomaly detection settings, where the samples in minority class or real anomalous class are viewed as anomalies. Further details are shown in Appendix E.1.
Dataset Splits	Yes	90% of ID samples are used for training, and 10% of ID samples and the same number of OOD samples are integrated together for testing. The partitioning of ID samples for training, along with the division of ID and OOD samples for testing, follows GOOD-D [20].
Hardware Specification	Yes	CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 256GB RAM. GPU: Tesla V100 PCIe 32GB GPU.
Software Dependencies	Yes	Software: Python 3.7, Pytorch 1.8, CUDA 11.0, and Pytorch-Geometric 2.0.1.
Experiment Setup	Yes	Detailed settings and additional results can be found in Appendix E. In this study, inspired by GOOD-D[20], we employ 5 layers of GIN [46] as the backbone and adopt a perturbation-free augmentation strategy to construct view Gγ = (A, P)... To analyze the sensitivity of λ for Red OUT, we alter the value from 1e-04 to 1. The AUC w.r.t different selections of λ is plotted in Figure 6.