reproducibilityindex.ai

Preconditioning Kernel Matrices

Authors: Kurt Cutajar, Michael Osborne, John Cunningham, Maurizio Filippone

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate datasets over a range of problem size and dimensionality. Because PCG is exact in the limit of iterations (unlike approximate techniques), we demonstrate a tradeoff between accuracy and computational effort that improves beyond state-of-the-art approximation and factorization approaches. In this section, we provide an empirical exploration of these preconditioners in a practical setting. We begin by considering three datasets for regression from the UCI repository (Asuncion & Newman, 2007), namely the Concrete dataset (n = 1030, d = 8), the Power Plant dataset (n = 9568, d = 4), and the Protein dataset (n = 45730, d = 9).
Researcher Affiliation	Academia	Kurt Cutajar KURT.CUTAJAR@EURECOM.FR EURECOM, Department of Data Science Michael A. Osborne MOSB@ROBOTS.OX.AC.UK University of Oxford, Department of Engineering Science John P. Cunningham JPC2181@COLUMBIA.EDU Columbia University, Department of Statistics Maurizio Filippone MAURIZIO.FILIPPONE@EURECOM.FR EURECOM, Department of Data Science
Pseudocode	Yes	Algorithm 1 The Preconditioned CG Algorithm, adapted from (Golub & Van Loan, 1996) Require: data X, vector v, convergence threshold ϵ, initial vector x0, maximum no. of iterations T
Open Source Code	Yes	Code to replicate all results in this paper is available at http://github.com/mauriziofilippone/preconditioned_GPs
Open Datasets	Yes	We begin by considering three datasets for regression from the UCI repository (Asuncion & Newman, 2007), namely the Concrete dataset (n = 1030, d = 8), the Power Plant dataset (n = 9568, d = 4), and the Protein dataset (n = 45730, d = 9). GP classiﬁcation: Spam dataset (n = 4601, d = 57) and EEG dataset (n = 14979, d = 14).
Dataset Splits	Yes	All methods are initialized from the same set of kernel parameters, and the curves are averaged over 5 folds (3 for the Protein and EEG datasets).
Hardware Specification	Yes	For the sake of integrity, we ran each method in the comparison individually on a workstation with Intel Xeon E5-2630 CPU having 16 cores and 128GB RAM.
Software Dependencies	No	The paper states that "The CG, PCG and CHOL approaches have been implemented in R;" but does not specify a version for R or any specific libraries/packages with version numbers that are critical for reproducibility. It mentions GPstuff as a comparison target but also without version.
Experiment Setup	Yes	The convergence threshold is set to ϵ2 = n 10 10 so as to roughly accept an average error of 10 5 on each element of the solution. We focus on an isotropic RBF variant of the kernel in eq. 1, ﬁxing the marginal variance σ2 to one. We vary the lengthscale parameter l and the noise variance λ in log10 scale. We set the stepsize to one. All methods are initialized from the same set of kernel parameters.