Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Transfer Learning on Edge Connecting Probability Estimation Under Graphon Model

Authors: Yuyao Wang, Yu-Hung Cheng, Debarghya Mukherjee, Huimin Cheng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on synthetic datasets show our method consistently achieves lower estimation error than state-of-the-art alternatives. On real-world networks, it outperforms existing approaches in both graph classification via data augmentation and link prediction tasks.
Researcher Affiliation	Academia	Yuyao Wang Yu-Hung Cheng Boston University Boston University EMAIL EMAIL Debarghya Mukherjee Huimin Cheng Boston University Boston University EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 summarizes our complete procedure, where the optimal transport can be calculated using either GW or its entropic variant EGW, and the choice depends primarily on the computational considerations.
Open Source Code	Yes	Our implementation is publicly available at https://github.com/olivia3395/GTRANS.
Open Datasets	Yes	For example, in the PROTEINS-Full dataset ([22]), each graph represents a protein structure and consists of only 25 nodes on average. Therefore, employing standard graphon estimation strategies may lead to poor accuracy. Fortunately, similar domains often offer larger graphs with related structures, e.g., the D&D dataset averaging 284 nodes per protein graph. Datasets. To address this challenge, we implemented GTRANS to enhance G-Mixup for graph classification by transferring knowledge from larger networks. In our experiments, we consider two co-actor graph datasets as targets: two-class IMDB-BINARY and three-class IMDB-MULTI, both characterized by small graph sizes. As candidate sources, we consider: (1) three-class COLLAB (average 74.49 nodes per graph), a collaboration network derived from scientific authorship data [39], and (2) two-class Reddit-Binary (average 429 nodes per graph), comprising Reddit user interaction threads [39]. Additionally, we examine a bioinformatics setting by transferring from two-class D&D (average 284.32 nodes) to two-class PROTEINS-Full (average 25.22 nodes) [39], both consisting of protein structure graphs.
Dataset Splits	Yes	For each target dataset, we split the dataset into train/validation/test data by 70%/10%/20%. We report the test accuracy on ten runs. Experimental Setup. We simulate a realistic link prediction task using a masking-based evaluation strategy following [65]. Specifically, we randomly mask a subset of edges in the upper triangular portion of the target adjacency matrix to form a test set. Let M {0, 1}n n be a masking matrix with Mij Bernoulli(1 p), where p is the test ratio. The observed matrix is Amask ij = Mij At,ij, meaning each edge is observed with probability 1 p. We set p = 0.1 unless otherwise specified.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. While the NeurIPS checklist indicates this information is in the Appendix, it is not present.
Software Dependencies	No	The paper mentions software like GIN model, Adam optimizer, and Python for implementation but does not specify their version numbers or any other key software components with versions.
Experiment Setup	Yes	We adopt the same Graph Convolutional Network (GCN) architecture as used in [22], using the same hyperparameters and training procedures for all benchmark comparisons. Full implementation details are provided in Appendix F.1. For each target dataset, we split the dataset into train/validation/test data by 70%/10%/20%. We report the test accuracy on ten runs. We follow a modified version of the training configuration from [22]. Specifically, we train a GIN model for 200 epochs using the Adam optimizer with a fixed learning rate of 0.01. The mini-batch size is set to 128, and the hidden dimension is 64. Validation loss is monitored throughout training, and test accuracy is reported at the epoch with the best validation performance.