Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fourier Sparse Leverage Scores and Approximate Kernel Learning
Authors: Tamas Erdelyi, Cameron Musco, Christopher Musco
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study a 2-D Gaussian process regression problem, representative of typical data-intensive function interpolation tasks, showing that our oblivious sketching method substantially improves on the original random Fourier features method on which it is based [RR07]. We compare our method against the classical RFF method on a kernel ridge regression problem involving precipitation data from Slovakia [NM13], a benchmark GIS data set. See Figure 3 for a description. The regression solution requires computing (K + λI) 1y, where y is a vector of training data. Doing so with a direct method is slow since K is large and dense, so an iterative solver is necessary. However, when cross validation is used to choose a kernel width σ and regularization parameter λ, the optimal choices lead to a poorly conditioned system, which leads to slow convergence. Results on preconditioning are shown in Figure 4. |
| Researcher Affiliation | Academia | Tamás Erdélyi Texas A&M University EMAIL Cameron Musco University of Mass. Amherst EMAIL Christopher Musco New York University EMAIL |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper discusses implementing the method and its simplifications but does not provide a link to open-source code or explicitly state that the code for the described methodology is released. |
| Open Datasets | Yes | We compare our method against the classical RFF method on a kernel ridge regression problem involving precipitation data from Slovakia [NM13], a benchmark GIS data set. |
| Dataset Splits | Yes | Our goal is to approximate this precipitation function based on 6400 training samples from randomly selected locations (visualized as black dots)... when cross validation is used to choose a kernel width σ and regularization parameter λ, the optimal choices lead to a poorly conditioned system, which leads to slow convergence. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'sklearn [PVG+11]' and that the method 'can be implemented in a few lines of code', but it does not specify version numbers for any software dependencies. |
| Experiment Setup | No | The paper mentions using '6400 training samples' and that 'cross validation is used to choose a kernel width σ and regularization parameter λ', but it does not provide specific hyperparameter values (e.g., exact σ, λ values used, learning rates, batch sizes) or detailed system-level training configurations to ensure reproducibility. |