Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A general linear-time inference method for Gaussian Processes on one dimension
Authors: Jackson Loper, David Blei, John P. Cunningham, Liam Paninski
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We develop parallelized algorithms for performing inference and learning in the LEG model, test the algorithm on real and synthetic data, and demonstrate scaling to datasets with billions of samples. |
| Researcher Affiliation | Academia | Jackson Loper EMAIL Data Science Institute Columbia University New York, New York 10027 David Blei EMAIL Data Science Institute Departments of Statistics and Computer Science Columbia University New York, New York 10027 John P. Cunningham EMAIL Department of Statistics Mortimer B. Zuckerman Mind Brain Behavior Institute Grossman Center for the Statistics of Mind Columbia University New York, New York 10027 Liam Paninski EMAIL Departments of Statistics and Neuroscience Mortimer B. Zuckerman Mind Brain Behavior Institute Grossman Center for the Statistics of Mind Columbia University New York, New York 10027 |
| Pseudocode | Yes | Algorithm 1: decompose Algorithm 2: halfsolve Algorithm 3: backhalfsolve Algorithm 4: invblocks Algorithm 5: expm Algorithm 6: expmgrad |
| Open Source Code | Yes | We implemented these algorithms in the pure-python leggps package (https: //github.com/jacksonloper/leg-gps). |
| Open Datasets | Yes | To check whether LEG parameterization can capture periodic covariances, we turned to the Mauna Loa CO2 dataset. For the last sixty years, the monthly average atmosphere CO2 concentrations at the the Mauna Loa Observatory in Hawaii have been recorded (Keeling and Whorf, 2005). We looked at observations from neural spiking data (Grosmark and Buzsáki, 2016). |
| Dataset Splits | Yes | For each target kernel, we wanted to minimize the maximum absolute difference between the target kernel and a LEG kernels of various ranks. ... We considered several kernels: a squared exponential kernel, a triangle kernel, a rational quadratic kernel with α = 2, a sinc kernel, and a Matern kernel of order 1. In each case we draw eighteen-thousand observations, each taken .1 units apart from the next. ... selected the model with the best likelihood on six thousand held-out samples. To test the ability of the LEG model to learn these kinds of structures from data, we trained a rank-5 LEG kernel on all the data before 1980 and all the data after 2000. |
| Hardware Specification | Yes | In each case we used an m5-24xlarge machine on Amazon Web Services (AWS). We used m5.24xlarge machines to represent CPU computation (96 cores from Intel Xeon Platinum 8000 processors, 384 gigabytes of memory), and p3.2xlarge machines to represent GPU computation (one Nvidia Tesla V100 processor with 16 gigabytes of memory). |
| Software Dependencies | No | The paper mentions software like "leggps package", "celerite2 package", "gpytorch package", "Tensor Flow2", "pylds", and "pyro.distributions.Gaussian HMM" but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Perform parameter estimation by gradient descent on the likelihood. ... random initialization was sufficient to find good parameter estimates. We exhaustively searched over values of this rank hyperparameter and selected the model with the best likelihood on six thousand held-out samples. To test the ability of the LEG model to learn these kinds of structures from data, we trained a rank-5 LEG kernel on all the data before 1980 and all the data after 2000. |