Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Over-squashing in Spatiotemporal Graph Neural Networks

Authors: Ivan Marisca, Jacob Bamberger, Cesare Alippi, Michael Bronstein

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our findings on synthetic and real-world datasets, providing deeper insights into their operational dynamics and principled guidance for more effective designs.
Researcher Affiliation	Collaboration	1Università della Svizzera italiana, IDSIA, Lugano, Switzerland. 2University of Oxford, Oxford, UK. 3Politecnico di Milano, Milan, Italy. 4AITHYRA, Vienna, Austria.
Pseudocode	No	The paper describes methods using mathematical equations and descriptions in paragraph text but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code to reproduce the experiments is available at github.com/marshka/spatiotemporal-oversquashing.
Open Datasets	Yes	We validate our findings on synthetic and real-world datasets... We empirically validate the effects of our proposed temporal convolution modifications through two synthetic sequence memory tasks: COPYFIRST and COPYLAST... PEMS-BAY contains 6 months of data from 325 traffic sensors in the San Francisco Bay Area, while METR-LA contains 4 months of analogous readings acquired from 207 detectors in the Los Angeles County Highway [19]... The Eng RAD dataset contains hourly measurements of 5 different weather variables collected at 487 grid points in England from 2018 to 2020. ... All datasets used can either be generated by the codebase or downloaded from public sources.
Dataset Splits	Yes	We sequentially split the windows into 70%/10%/20% partitions for training, validation, and testing, respectively.
Hardware Specification	Yes	All experiments were conducted on a workstation running Ubuntu 22.04.5 LTS, equipped with two AMD EPYC 7513 CPUs and four NVIDIA RTX A5000 GPUs, each with 24 GB of memory.
Software Dependencies	No	All the code used for the experiments has been developed with Python [69] and relies on the following open-source libraries: Py Torch [70]; Py Torch Geometric [71]; Torch Spatiotemporal [72]; Py Torch Lightning [73]; Hydra [74]; Numpy [75]. We relied on Weights & Biases [76] for tracking and logging experiments. Specific version numbers for these software dependencies are not provided in the text.
Experiment Setup	Yes	We trained all models using the Adam [77] optimizer with an initial learning rate of 0.001, scheduled by a cosine annealing strategy that decays the learning rate to 10 6 over the full training run. Gradients are clipped to a maximum norm of 5 to improve stability. For synthetic experiments, we trained for a maximum of 150 epochs with early stopping if the validation loss did not improve for 30 consecutive epochs, using mini-batches of size 32. To reduce computational time, we limit each epoch to the first 400 randomly sampled batches in the experiments for Fig.4. For experiments on real-world datasets, we used the MAE as the loss function, trained for up to 200 epochs with a patience of 50 epochs, and in each epoch randomly sampled without replacement 300 mini-batches of size 64 from the training set.