Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Spatio-Temporal Variational Gaussian Processes
Authors: Oliver Hamelijnck, William Wilkinson, Niki Loppi, Arno Solin, Theodoros Damoulas
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We examine the scalability and performance of ST-VGP and its variants. Throughout, we use a Matérn-3/2 kernel and optimise the hyperparameters by maximising the ELBO using Adam [32]." and "Table 1: NYC-CRIME (small) results. ST-SVGP = SVGP when Z is fixed. TRAIN Z MODEL RMSE NLPD ST-SVGP 3.02 0.13 1.72 0.04 SVGP 3.02 0.13 1.72 0.04 ST-SVGP 2.79 0.15 1.64 0.04 SVGP 2.94 0.12 1.65 0.05 |
| Researcher Affiliation | Collaboration | Oliver Hamelijnck The Alan Turing Institute / University of Warwick EMAIL William J. Wilkinson Aalto University EMAIL Niki A. Loppi NVIDIA EMAIL Arno Solin Aalto University EMAIL Theodoros Damoulas The Alan Turing Institute / University of Warwick EMAIL |
| Pseudocode | Yes | Algorithm 1 Spatio-temporal sparse VGP" and "Algorithm 2 Sparse spatio-temporal smoothing |
| Open Source Code | Yes | We provide JAX code for all methods at https://github.com/Aalto ML/spatio-temporal-GPs. |
| Open Datasets | Yes | NYC-CRIME Count Dataset We model crime numbers across New York City, USA (NYC), using daily complaint data from [1]. [1] 2014 2015 crimes reported in all 5 boroughs of New York City. https://www.kaggle.com/ adamschroeder/crimes-new-york-city." and "Using hourly data from the London air quality network [29] between January 2019 and April 2019... [29] Imperial College London. Londonair London air quality network (LAQN). https://www.londonair. org.uk, 2020. |
| Dataset Splits | Yes | We use 5-fold cross-validation (i.e., 80 20 train-test split), train for 500 iterations (except for AIR-QUALITY where we train for 300) and report RMSE, negative log predictive density (NLPD, see App. K.1) and average per-iteration training times on CPU and GPU. |
| Hardware Specification | No | The paper mentions running experiments on 'CPU and GPU' and refers to 'computational resources provided by the Aalto Science-IT project and CSC IT Center for Science, Finland', but does not provide specific hardware details like CPU or GPU models. |
| Software Dependencies | No | The paper mentions 'JAX code' but does not specify version numbers for JAX or any other software libraries required for replication. |
| Experiment Setup | Yes | We use learning rates of ρ = 0.01, β = 1 in the conjugate case, and ρ = 0.01, β = 0.1 in the non-conjugate case. We train for 500 iterations (except for AIR-QUALITY where we train for 300) and report RMSE, negative log predictive density (NLPD, see App. K.1) and average per-iteration training times on CPU and GPU. ... SVGP with 2000, 2500, 5000, and 8000 inducing points with mini-batch sizes of 600, 800, 2000, and 3000 respectively. |