Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sketch-Augmented Features Improve Learning Long-Range Dependencies in Graph Neural Networks

Authors: Ryien Hosseini, Filippo Simini, Venkatram Vishwanath, Rebecca Willett, Henry Hoffmann

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on real-world graph learning tasks confirm that this strategy consistently improves performance over baseline GNNs, offering both a standalone solution and a complementary enhancement to existing techniques such as graph positional encodings.
Researcher Affiliation Academia 1University of Chicago 2Argonne National Laboratory 3 NSF-Simons National Institute for Theory and Mathematics in Biology EMAIL EMAIL
Pseudocode Yes Algorithm 1 Sketched Feature GNN
Open Source Code Yes Our source code is available at https://github.com/ryienh/sketched-random-features.
Open Datasets Yes We evaluate SRF s ability to mitigate oversquashing using the Tree Neighbors Match synthetic benchmark [2] discussed in Section 2. ... We evaluate the universal approximation capabilities of SRF-enhanced MPGNNs using two graph isomorphism discrimination benchmarks: CSL [54] and EXP [1]... We evaluate on REDDIT-B and REDDIT-M datasets [69]... We evaluate OOD generalization on Drug OOD [33]... To assess whether SRF complements structural approaches, we evaluate on Peptides-struct [22], a benchmark with long-range dependencies where graph transformers typically excel.
Dataset Splits Yes Reddit Datasets. Following standard practice [36], we employ 10-fold cross-validation and report results for the best epoch across 330 training epochs. ... Drug OOD. We follow the experimental setup from [36], training for 150 epochs and reporting results on the out-of-distribution test set using L1 loss. ... Peptides-struct. Models are trained for 500 epochs using L1 loss with residual connections.
Hardware Specification Yes All performance evaluations were conducted using an AMD EPYC 7713 64-Core Processor running Red Hat Enterprise Linux 9.3 and a NVIDIA DGX A100 GPUs (80GB memory). At times during experimentation, a cluster of 8 such GPUs were used to run parallel experiments.
Software Dependencies Yes Our graph processing and learning experiments utilize the open-source Py Torch Geometric library as the primary framework, with Network X serving as a supplementary tool for graph operations. ... To ensure reproducibility and accurate runtime evaluation, we include a catalog of all software dependencies and their specific versions in the Supplementary material.
Experiment Setup Yes All experiments use SRF with parameters k = 8 and search over the SRF hyperparameter D k {16, 32, 64, 128, 256} (See Section 3). Hyperparameter optimization is conducted using Weights and Biases across all datasets. The Adam optimizer is used throughout all experiments. ... Reddit Datasets. ... The hyperparameter grid search includes: number of layers {3, 4, 5, 6, 7, 8}, hidden dimensions {32, 64, 128, 256, 512}, batch size {16, 32, 64, 128}, and learning rate [10 5, 10 2]. ... Drug OOD. ... The hyperparameter search includes: number of layers {3, 4, 5, 6, 7}, batch size {16, 32, 64, 128}, learning rate [10 5, 10 2], layer normalization {true, false}, batch normalization {true, false}, and hidden dimensions {32, 64, 80, 90, 100, 110]. ... Peptides-struct. ... The hyperparameter search includes: number of layers {4, 5, 6, 7, 8, 9}, hidden dimensions {70, 95, 105, 135, 150}, batch size {10, 25, 50, 75}, learning rate [10 5, 10 2], layer normalization {true, false}, and batch normalization {true, false}.