Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A general linear-time inference method for Gaussian Processes on one dimension

Authors: Jackson Loper, David Blei, John P. Cunningham, Liam Paninski

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We develop parallelized algorithms for performing inference and learning in the LEG model, test the algorithm on real and synthetic data, and demonstrate scaling to datasets with billions of samples.
Researcher Affiliation	Academia	Jackson Loper EMAIL Data Science Institute Columbia University New York, New York 10027 David Blei EMAIL Data Science Institute Departments of Statistics and Computer Science Columbia University New York, New York 10027 John P. Cunningham EMAIL Department of Statistics Mortimer B. Zuckerman Mind Brain Behavior Institute Grossman Center for the Statistics of Mind Columbia University New York, New York 10027 Liam Paninski EMAIL Departments of Statistics and Neuroscience Mortimer B. Zuckerman Mind Brain Behavior Institute Grossman Center for the Statistics of Mind Columbia University New York, New York 10027
Pseudocode	Yes	Algorithm 1: decompose Algorithm 2: halfsolve Algorithm 3: backhalfsolve Algorithm 4: invblocks Algorithm 5: expm Algorithm 6: expmgrad
Open Source Code	Yes	We implemented these algorithms in the pure-python leggps package (https: //github.com/jacksonloper/leg-gps).
Open Datasets	Yes	To check whether LEG parameterization can capture periodic covariances, we turned to the Mauna Loa CO2 dataset. For the last sixty years, the monthly average atmosphere CO2 concentrations at the the Mauna Loa Observatory in Hawaii have been recorded (Keeling and Whorf, 2005). We looked at observations from neural spiking data (Grosmark and Buzsáki, 2016).
Dataset Splits	Yes	For each target kernel, we wanted to minimize the maximum absolute diﬀerence between the target kernel and a LEG kernels of various ranks. ... We considered several kernels: a squared exponential kernel, a triangle kernel, a rational quadratic kernel with α = 2, a sinc kernel, and a Matern kernel of order 1. In each case we draw eighteen-thousand observations, each taken .1 units apart from the next. ... selected the model with the best likelihood on six thousand held-out samples. To test the ability of the LEG model to learn these kinds of structures from data, we trained a rank-5 LEG kernel on all the data before 1980 and all the data after 2000.
Hardware Specification	Yes	In each case we used an m5-24xlarge machine on Amazon Web Services (AWS). We used m5.24xlarge machines to represent CPU computation (96 cores from Intel Xeon Platinum 8000 processors, 384 gigabytes of memory), and p3.2xlarge machines to represent GPU computation (one Nvidia Tesla V100 processor with 16 gigabytes of memory).
Software Dependencies	No	The paper mentions software like "leggps package", "celerite2 package", "gpytorch package", "Tensor Flow2", "pylds", and "pyro.distributions.Gaussian HMM" but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	Perform parameter estimation by gradient descent on the likelihood. ... random initialization was suﬃcient to ﬁnd good parameter estimates. We exhaustively searched over values of this rank hyperparameter and selected the model with the best likelihood on six thousand held-out samples. To test the ability of the LEG model to learn these kinds of structures from data, we trained a rank-5 LEG kernel on all the data before 1980 and all the data after 2000.