Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Uncertainty Estimation on Graphs with Structure Informed Stochastic Partial Differential Equations

Authors: Fred Xu, Thomas Markovich

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments on Out-of-Distribution (OOD) detection on graph datasets with varying label informativeness demonstrate the soundness and superiority of our model to existing approaches. In section 4.1, we demonstrate model performance on Out-of-distribution (OOD) detection on semi-supervised node classification task under the influence of label, feature, and structure shifts, on both low LI and high LI datasets. In section 4.2, we examine the impact of kernel smoothness parameter ν on model performance with respect to label informativeness. In section 4.3, we provide a graph rewiring perspective on stochastic message passing.
Researcher Affiliation	Industry	Fred Xu Block Inc University of California, Los Angeles EMAIL, Thomas Markovich Block Inc EMAIL
Pseudocode	No	The paper describes methods and equations in sections like '3.1 Φ-Wiener Process on Graph', '3.2 Structure Informed Graph SPDE', and '3.3 Practical Implementation: Graph ODE with Random Forcing', but does not contain a dedicated pseudocode or algorithm block.
Open Source Code	No	We will release the open sourced code to Git Hub after the paper is accepted.
Open Datasets	Yes	Our extensive experiments on Out-of-Distribution (OOD) detection on graph datasets with varying label informativeness demonstrate the soundness and superiority of our model to existing approaches. Our extensive experiments on Out-of-Distribution (OOD) detection on 8 graph datasets with varying degrees of label informativeness to demonstrate our model s effectiveness. This includes heterophilous / low LI datasets [36] (Roman Empire, Amazon Ratings, Minesweeper, Tolokers, and Questions) and high LI datasets (Cora [30] Citeseer[43], and Pubmed [34]).
Dataset Splits	Yes	The train-validation-test splits of these three datasets follow the standard practice of previous works. For the experiment of label leave out, we follow the common practices for the homophilous datasets; for heterophilous datasets, we either choose the last label or more labels to alleviate class imbalance. Below we present the OOD label choice for all datasets: Cora: class labels 4, 5, 6 as IND while class labels 0, 1, 2, 3 as OOD samples. Pubmed: class label 1, 2 as IND samples while class labels 0 as OOD samples. Citeseer: class labels 3, 4, 5 as IND while class labels 0, 1, 2 as OOD samples. Tolokers: class 0 as IND while class 1 as OOD samples. Roman Empire: class labels 0-8 as IND samples while class labels 9-17 as OOD samples. Amazon Ratings: class labels 0, 1, 2 as IND samples while class labels 3, 4, 5 as OOD samples. Minesweeper: class 0 as IND while class 1 as OOD samples. Questions: class 0 as IND while class 1 as OOD samples.
Hardware Specification	Yes	The experiments are conducted on the Mosaic ML platform with 8 H100 GPUs, and the cluster docker image is the latest AWS image with Ubuntu 20.04.
Software Dependencies	Yes	We implement our pipeline using Python 3.12 and Py Torch version 2.5.1 with CUDA 12.4 support.
Experiment Setup	Yes	For hyperparameter search, we perform grid search on the following parameters: kernel smoothness ν: {0.1, 0.3, 0.5, 1.0, 3.0, 4.0, 5.0, 10.0, 20.0, 50.0}. latent dimension: {64, 128, 256, 512, 1024}. learning rate: {1e 4, 1e 3, 1e 2} weight decay:{1e 4, 1e 3, 1e 2} dropout: {0.0, 0.5} Chebyshev polynomial order: {30, 40, 50, 60, 80, 100} training sample times: {1, 3, 5, 7}