Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generative Graph Pattern Machine

Authors: Zehong Wang, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, G2PM demonstrates strong scalability: on the ogbn-arxiv benchmark, it continues to improve with model sizes up to 60M parameters, outperforming prior generative approaches that plateau at significantly smaller scales (e.g., 3M). In addition, we systematically analyze the model design space, highlighting key architectural choices that contribute to its scalability and generalization. Across diverse tasks including node/link/graph classification, transfer learning, and crossgraph pretraining G2PM consistently outperforms strong baselines, establishing a compelling foundation for scalable graph learning.
Researcher Affiliation	Academia	1University of Notre Dame, 2University of Connecticut
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes methodologies in text and mathematical formulas, accompanied by figures.
Open Source Code	Yes	The code and dataset are available at https://github.com/Zehong-Wang/G2PM.
Open Datasets	Yes	We evaluate G2PM on a suite of homophily graphs of varying scales, including Pubmed, Photo, Computers, Wiki CS, Flickr, ogbn-arxiv, and ogbn-products (see Table 2 for dataset statistics). ... We use three datasets, including Cora, Pubmed, and ogbl-collab, and follow a standard 80/5/15 split for training, validation, and test sets. ... We evaluate G2PM on seven datasets: five molecular graphs (HIV, PCBA, SIDER, MUV, Clin Tox) and two social networks (IMDB-B, Reddit-M12K).
Dataset Splits	Yes	We adopt a linear probe setup: node embeddings are frozen after pre-training and used to train a separate classifier, where we take 10/10/80 random split for Pubmed, Photo, and Computers, and the official split for the remaining datasets. ... We use three datasets, including Cora, Pubmed, and ogbl-collab, and follow a standard 80/5/15 split for training, validation, and test sets. ... We use public splits for molecular graphs and 80/10/10 random splits for social networks, following a linear probe protocol.
Hardware Specification	Yes	Most experiments are conducted on Linux servers equipped with four Nvidia A40 GPUs.
Software Dependencies	Yes	The models are implemented using Py Torch 2.4.0, Py Torch Geometric 2.6.1, and Py Torch Cluster 1.6.3, with CUDA 12.1 and Python 3.9.
Experiment Setup	Yes	In our setup, we use the Adam W optimizer with weight decay and set the number of epochs as 100. All experiments are conducted five times with different random seeds. The batch size is set to 256 by default. We present detailed default setup in Table 7. ... We perform hyperparameter tuning over the following ranges: learning rate {1e 3, 7e 4, 5e 4, 3e 4, 1e 4, 7e 5, 5e 5, 3e 5, 1e 5}, pattern size {4, 8, 16}, feature augmentation ratio pfeat [0.0, 0.9], and substructure augmentation ratio psub [0.0, 0.9]. The final selected hyper-parameters are reported in Table 8.