Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Kernel Partial Least Squares for Stationary Data
Authors: Marco Singer, Tatyana Krivobokova, Axel Munk
JMLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | It is shown both theoretically and in simulations that long range dependence results in slower convergence rates. A protein dynamics example shows high predictive power of kernel partial least squares. [...] To validate the theoretical results of the previous sections, we conducted a simulation study. |
| Researcher Affiliation | Academia | Marco Singer EMAIL Institute for Mathematical Stochastics Georg-August-Universit at G ottingen, 37077, Germany; Tatyana Krivobokova EMAIL Institute for Mathematical Stochastics Georg-August-Universit at G ottingen, 37077, Germany; Axel Munk EMAIL Institute for Mathematical Stochastics Georg-August-Universit at G ottingen, 37077, Germany |
| Pseudocode | No | The paper describes algorithms mathematically and refers to them (e.g., KPLS, KCG) but does not provide structured pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper mentions using a "protein dynamics example" and "T4 Lysozyme (T4L)" data for its application. While these are specific types of data, the paper does not provide concrete access information (e.g., a link, DOI, or specific repository) for the dataset used in their experiments. It describes the data source as "molecular dynamics simulations" but no public access is indicated. |
| Dataset Splits | Yes | The first 50% of the data form a training set to calculate the kernel partial least squares estimator and the remaining data are used for testing. |
| Hardware Specification | No | The paper does not contain any specific details about the hardware (e.g., CPU, GPU models) used to run the simulations or experiments. |
| Software Dependencies | No | The paper discusses various algorithms and mathematical frameworks but does not specify any software names with version numbers that were used for implementation (e.g., programming languages, libraries, solvers). |
| Experiment Setup | Yes | The reproducing kernel Hilbert space is chosen to correspond to the Gaussian kernel k(x, y) = exp( l x y 2), x, y Rd, l = 2, for d = 1. [...] The parameter l > 0 is calculated via cross validation on the training set. In our evaluation we obtained l = 10.22. [...] a maximum of 40 iteration steps. [...] The source parameter is taken to be r = 4.5 and we consider the function f(x) = 4.37 1{3L4(x, 4) 2L4(x, 3) + 1.5L4(x, 9)}, x R. [...] The residuals ε(j) 1 , . . . , ε(j) n are generated as independent standard normally distributed random variables and independent of {X(j) t }n t=1 . The response is defined as y(j) t = f(X(j) t )+ η ε(j) t , t = 1, . . . , n, j = 1, . . . , M, with η = 1/16. |