Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Change Surfaces for Expressive Multidimensional Changepoints and Counterfactual Prediction

Authors: William Herlands, Daniel B. Neill, Hannes Nickisch, Andrew Gordon Wilson

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using two large spatio-temporal datasets we employ GPCS to discover and characterize complex changes that can provide scientiﬁc and policy relevant insights. Speciﬁcally, we analyze twentieth century measles incidence across the United States and discover previously unknown heterogeneous changes after the introduction of the measles vaccine. Additionally, we apply the model to requests for lead testing kits in New York City, discovering distinct spatial and demographic patterns. Section 4 is titled "Experiments".
Researcher Affiliation	Collaboration	William Herlands EMAIL ... Carnegie Mellon University, Daniel B. Neill EMAIL ... New York University, Hannes Nickisch EMAIL Digital Imaging Philips Research Hamburg, Andrew Gordon Wilson EMAIL ... Cornell University.
Pseudocode	Yes	Algorithm 1 Initialize RKS w(x) by optimizing a simpliﬁed model with RBF kernels and Algorithm 2 Initialize spectral mixture kernels are presented on pages 17 and 18 respectively.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	We use yearly counts of accidents from Jarrett (1979)., All data were taken from the 2014 American Community Survey 5 year average at the zip code level (Census Bureau, 2014b)., Incidence rates per 100,000 population based on historical population estimates are made publicly available by Project Tycho (van Panhuis et al., 2013).
Dataset Splits	No	Using synthetic data, we create a predictive test by splitting the data into training and testing sets. This statement is too general and does not provide specific details on the splits (e.g., percentages, sample counts, or methodology).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, or detailed computer specifications used for running its experiments.
Software Dependencies	No	The paper mentions using "KISS-GP framework (Wilson and Nickisch, 2015)" and "Gaussian processes for machine learning (gpml) toolbox (Rasmussen and Nickisch, 2010)" but does not specify any software libraries or packages with their version numbers that would be needed to replicate the experiments.
Experiment Setup	Yes	Therefore, we use m1 = 100 and m2 = 20 for Algorithm 1. Speciﬁcally, we let Λ = (range(x)/2)^2, σ0 = std(y), and σn = mean(\|y\|)/10. For each method we average the results for 10 random restarts.