Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Spreading Out-of-Distribution Detection on Graphs

Authors: Daeho Um, Jongin Lim, Sunoh Kim, Yuneil Yeo, Yoonho Jung

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experimental results demonstrate the superiority of our approach over state-of-the-art methods in both spreading OOD detection and conventional node-level OOD detection tasks across seven benchmark datasets. The source code is available at https://github.com/daehoum1/edbd. ... We conduct extensive experiments on 1) spreading OOD detection and 2) label leave-out. ... Tables 1, 2, 3, 4 present detailed performance metrics, comparisons with baselines, and ablation studies.
Researcher Affiliation	Collaboration	Daeho Um AI Center, Samsung Electronics EMAIL Jongin Lim AI Center, Samsung Electronics EMAIL Sunoh Kim Computer Engineering Dankook University EMAIL Yuneil Yeo Department of Civil and Environmental Engineering UC Berkeley EMAIL Yoonho Jung Department of Electrical and Computer Engineering Seoul National University EMAIL
Pseudocode	No	The paper describes methods and processes using mathematical equations and textual explanations, for example in sections 3.3 'OOD Spreading Scheme' and 4.2 'Energy Distribution-Based Aggregation', but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it present structured steps formatted explicitly as an algorithm.
Open Source Code	Yes	The source code is available at https://github.com/daehoum1/edbd.
Open Datasets	Yes	We curate realistic benchmarks by employing the epidemic spreading models that simulate the spreading of OOD nodes on the graph. We also showcase a Spreading COVID-19 dataset to demonstrate the applicability of spreading OOD detection in real-world scenarios. ... We utilize a graph structure from the Last FM Asia graph (Rozemberczki & Sarkar, 2020). ... We conduct additional experiments on the OGBN-Arxiv dataset, which contains 169,343 nodes, representing a large-scale graph. ... For label leave-out, following the setup in Wu et al. (2023), we perform experiments on four benchmark datasets: Cora (Sen et al., 2008), Amazon-Photo (Shchur et al., 2018), Amazon Computers (Shchur et al., 2018), and Coauthor-CS (Sinha et al., 2015). ... We provide the Spreading COVID-19 dataset in the supplementary material.
Dataset Splits	Yes	For evaluation, we assume a graph G = (V, E, X) is given, and formulate a graph-based OOD problem. Specifically, we define G to be a graph consisting of N ID samples in Dtest in as nodes of the graph. ... Among these episodes, 5 episodes are designated as the validation set and 10 episodes as the test set for OOD detection. ... For a training/validation/test split on ID nodes in the Cora datasets, we adhere to the split used in Kipf & Welling (2016). For splits on ID nodes in the Amazon-Photo and Coauthor-CS datasets, we use random splits for training, validation, and test nodes with proportions of 0.1, 0.1, and 0.8, respectively.
Hardware Specification	Yes	We conduct all the experiments on a single NVIDIA Ge Force RTX 2080 Ti GPU with 11GB memory and an Intel Core I5-6600 CPU @ 3.30 GHz.
Software Dependencies	No	For training, we leverage Adam optimizer (Kingma & Ba, 2014) and set the maximum number of epochs to 200. ... All the datasets are provided in Pytorch Geometric (Fey & Lenssen, 2019). ... For all the baselines except for OODGAT, we utilize the implementations provided in the Git Hub repository released by Wu et al. (2023).
Experiment Setup	Yes	For all experiments, hyperparameters are tuned on validation sets. Further experimental details are provided in Appendix B. ... For training, we leverage Adam optimizer (Kingma & Ba, 2014) and set the maximum number of epochs to 200. We report test performance at an epoch which yields the lowest validation loss. Learning rates are selected within {0.01, 0.001, 0.0001} by using a grid search. ... EDBD-specific hyperparameters (α, β, ϵ) are selected from {(α, β, ϵ)\|α {0.1, 0.2, 0.3, 0.5}, β {1, 1/4}, ϵ {0.01, 0.05, 0.1, 0.5, 0.75}} based on validation sets. The number of aggregation, K, is chosen from {1, 2}.