Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Recursive Sampling for the Nystrom Method

Authors: Cameron Musco, Christopher Musco

NeurIPS 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically we show that it finds more accurate kernel approximations in less time than popular techniques such as classic Nyström approximation and the random Fourier features method. We conclude with an empirical evaluation of our recursive RLS-Nyström method.
Researcher Affiliation Academia Cameron Musco MIT EECS EMAIL Christopher Musco MIT EECS EMAIL
Pseudocode Yes Algorithm 1 RLS-NYSTRÖM SAMPLING input: x1, . . . , xn ∈ X, kernel matrix K, ridge parameter λ > 0, failure probability δ ∈ (0, 1/8) ... Algorithm 2 RECURSIVERLS-NYSTRÖM. input: x1, . . . , xm ∈ X, kernel function K : X X ! R, ridge λ > 0, failure prob. δ ∈ (0, 1/32)
Open Source Code No The paper does not explicitly state that source code for the described methodology is being released or provide a link to a repository. It mentions using existing implementations like WEKA, scikit-learn, and IBM Libskylark, but not code for their own method.
Open Datasets Yes We evaluate RLS-Nyström on the Year Prediction MSD, Covertype, Cod-RNA, and Adult datasets downloaded from the UCI ML Repository [Lic13] and [UKM06].
Dataset Splits No The paper mentions "training points" and "cross validation" but does not specify exact training, validation, or test splits (e.g., 80/10/10 split percentages or sample counts) for reproducibility.
Hardware Specification No The paper mentions runtime in seconds and general comparisons but does not specify any particular hardware used for the experiments, such as GPU models, CPU models, or cloud computing instances.
Software Dependencies No The paper mentions using a Gaussian kernel and that "WEKA data mining software," "scikit-learn," and "IBM Libskylark" are widely implemented, but it does not specify any software names with version numbers required to reproduce the experiments.
Experiment Setup Yes We use a variant of Algorithm 2 where, instead of choosing a regularization parameter λ, the user sets a sample size s and λ is automatically determined such that s = O(dλeff/δ)). ... We use a Gaussian kernel for all tests, with the width parameter σ selected via cross validation on regression and classification tasks.