Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Kernel Partial Least Squares for Stationary Data

Authors: Marco Singer, Tatyana Krivobokova, Axel Munk

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	It is shown both theoretically and in simulations that long range dependence results in slower convergence rates. A protein dynamics example shows high predictive power of kernel partial least squares. [...] To validate the theoretical results of the previous sections, we conducted a simulation study.
Researcher Affiliation	Academia	Marco Singer EMAIL Institute for Mathematical Stochastics Georg-August-Universit at G ottingen, 37077, Germany; Tatyana Krivobokova EMAIL Institute for Mathematical Stochastics Georg-August-Universit at G ottingen, 37077, Germany; Axel Munk EMAIL Institute for Mathematical Stochastics Georg-August-Universit at G ottingen, 37077, Germany
Pseudocode	No	The paper describes algorithms mathematically and refers to them (e.g., KPLS, KCG) but does not provide structured pseudocode blocks or algorithms.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	No	The paper mentions using a "protein dynamics example" and "T4 Lysozyme (T4L)" data for its application. While these are specific types of data, the paper does not provide concrete access information (e.g., a link, DOI, or specific repository) for the dataset used in their experiments. It describes the data source as "molecular dynamics simulations" but no public access is indicated.
Dataset Splits	Yes	The ﬁrst 50% of the data form a training set to calculate the kernel partial least squares estimator and the remaining data are used for testing.
Hardware Specification	No	The paper does not contain any specific details about the hardware (e.g., CPU, GPU models) used to run the simulations or experiments.
Software Dependencies	No	The paper discusses various algorithms and mathematical frameworks but does not specify any software names with version numbers that were used for implementation (e.g., programming languages, libraries, solvers).
Experiment Setup	Yes	The reproducing kernel Hilbert space is chosen to correspond to the Gaussian kernel k(x, y) = exp( l x y 2), x, y Rd, l = 2, for d = 1. [...] The parameter l > 0 is calculated via cross validation on the training set. In our evaluation we obtained l = 10.22. [...] a maximum of 40 iteration steps. [...] The source parameter is taken to be r = 4.5 and we consider the function f(x) = 4.37 1{3L4(x, 4) 2L4(x, 3) + 1.5L4(x, 9)}, x R. [...] The residuals ε(j) 1 , . . . , ε(j) n are generated as independent standard normally distributed random variables and independent of {X(j) t }n t=1 . The response is deﬁned as y(j) t = f(X(j) t )+ η ε(j) t , t = 1, . . . , n, j = 1, . . . , M, with η = 1/16.