Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Effects of Dropout on Performance in Long-range Graph Learning Tasks

Authors: Jasraj Singh, Keyue Jiang, Brooks Paige, Laura Toni

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on long-range synthetic and real-world datasets confirm the predicted limitations of existing edge-dropping and feature-dropping methods. Moreover, Drop Sens with GCN consistently outperforms graph rewiring techniques designed to mitigate over-squashing, suggesting that simple, targeted modifications can substantially improve a model s ability to capture long-range interactions.
Researcher Affiliation	Academia	Jasraj Singh1 Keyue Jiang2 Brooks Paige2 Laura Toni2 1Nanyang Technological University 2University College London
Pseudocode	Yes	In Listing 1, we present the Drop Sens implementation used in our experiments, relying mainly on Sym Py [60].
Open Source Code	Yes	The code for reproducing the results in our this work is available at https://github.com/ignasa007/Dropout-Effects-GNNs.
Open Datasets	Yes	Our experiments on long-range synthetic and real-world datasets confirm the predicted limitations of existing edge-dropping and feature-dropping methods. Moreover, Drop Sens with GCN consistently outperforms graph rewiring techniques designed to mitigate over-squashing, suggesting that simple, targeted modifications can substantially improve a model s ability to capture long-range interactions. Our experiments on long-range synthetic and real-world datasets confirm the predicted limitations of existing edge-dropping and feature-dropping methods. Moreover, Drop Sens with GCN consistently outperforms graph rewiring techniques designed to mitigate over-squashing, suggesting that simple, targeted modifications can substantially improve a model s ability to capture long-range interactions. and Synthetic ZINC [33] is a synthetic variant of the ZINC dataset [42], designed to study the effect of information mixing in graph learning. and In this work, we use Cora [58], Cite Seer [31] and Pub Med [63] as representatives of homophilic datasets [53, 100], and Squirrel, Chameleon and Twitch DE [71] to represent heterophilic datasets [53].
Dataset Splits	Yes	For the Synthetic ZINC task, we use the train-val-test splits provided in Py G. For the homophilic (citation) networks, we use the full split [12], as provided in Py G, and for the heterophilic networks, we randomly sample 60% of the nodes for training, 16% for validation, and 24% for testing. On the other hand, for the graph classification tasks, we sample 80% of the graphs for training, and 10% each for validation and testing, following [8, 44].
Hardware Specification	Yes	All experiments were run on a server equipped with an Intel(R) Xeon(R) E5-2620 v3 CPU, 62 GB of RAM, 4 NVIDIA Ge Force GTX TITAN X GPU (12 GB VRAM each), and CUDA version 12.4.
Software Dependencies	Yes	All experiments were run on a server equipped with an Intel(R) Xeon(R) E5-2620 v3 CPU, 62 GB of RAM, 4 NVIDIA Ge Force GTX TITAN X GPU (12 GB VRAM each), and CUDA version 12.4.
Experiment Setup	Yes	We standardize most of the hyperparameters across all experiments to isolate the effect of random dropping. Specifically, we use symmetric normalization of the adjacency matrix to compute the edge weights for GCN, and we set the number of attentions heads for GAT to 2 in order to keep the computational load manageable, while at the same time harnessing the expressiveness of the multi-headed self-attention mechanism. For the Synthetic ZINC dataset, we fix the size of the hidden representations at 16, while we fix them to 64 for all the real-world datasets. In all settings, a linear transformation is applied to the node features before message-passing. Afterwards, a bias term is added and then the Re LU nonlinearity is applied. Finally, a linear readout layer is used to compute the regressand (for regression tasks) or logits (for classification). ... The models are trained using the Adam optimizer [46]. On the Synthetic ZINC dataset, the models are trained with a learning rate of 2 10 3 and a weight decay of 1 10 4, for a total of 200 epochs. On the real-world datasets, we use a learning rate of 1 10 3 and no weight decay, following [8, 44]. Here, we cap the maximum number of epochs at 300.