Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Revisiting the Nyström Method for Improved Large-scale Machine Learning

Authors: Alex Gittens, Michael W. Mahoney

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods... Thus, we complement our empirical results with a suite of worst-case theoretical bounds for both random sampling and random projection methods.
Researcher Affiliation	Academia	Alex Gittens EMAIL Michael W. Mahoney EMAIL International Computer Science Institute and Department of Statistics University of California, Berkeley Berkeley, CA
Pseudocode	Yes	Algorithm 1: Algorithm (Drineas et al., 2012, Algorithm 1) for approximating the leverage scores ℓi of an n d matrix A, where n d, to within a multiplicative factor of 1 ϵ. The running time of the algorithm is O(nd ln( ln n) + ndϵ 2 ln n + d2ϵ 2( ln n)2 ln d). Algorithm 2: Algorithm (Drineas et al., 2012, Algorithm 4) for approximating the leverage scores (relative to the best rank-k approximation to A) of a general n d matrix A with those of a matrix that is close by in the spectral norm (or the Frobenius norm if q = 0). This algorithm runs in time O(ndkq) + T1, where T1 is the running time of Algorithm 1. Algorithm 3: Algorithm for approximating the leverage scores (relative to the best rank-k approximation to A) of a general n d matrix A with those of a matrix that is close by in the spectral norm. This is a modiﬁed version of Algorithm 2, in which the random projection is implemented with an SRFT rather than a Gaussian random matrix, and where the number of iterations q is prespeciﬁed.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It mentions that "Our analysis is timely for several reasons. First, in spite of the empirical successes of Nystr om-based and other randomized low-rank methods, existing theory for the Nystr om method is quite modest... Third, in recent years several high-quality numerical implementations of randomized matrix algorithms for least-squares and low-rank approximation problems have been developed (Avron et al., 2010; Meng et al., 2014; Woolfe et al., 2008; Rokhlin et al., 2009; Martinsson et al., 2011)." but this refers to prior work's implementations, not the authors' own code. No repository link or explicit statement of code release is found.
Open Datasets	Yes	Table 4 provides summary statistics for the data sets used in our empirical evaluation. We consider four classes of matrices commonly encountered in machine learning and data analysis applications... The data sets used in our empirical evaluation (Leskovec et al., 2007; Klimt and Yang, 2004; Guyon et al., 2005; Gustafson et al., 2006; Nielsen et al., 2002; Corke, 1996; Asuncion and Newman, 2012).
Dataset Splits	No	The paper focuses on evaluating low-rank matrix approximation algorithms and their reconstruction errors, rather than training predictive machine learning models. It specifies parameters like 'number of column samples ℓ' and 'average errors observed over 30 trials' for the algorithm's internal sampling mechanisms, but it does not define traditional 'training/test/validation' splits for the datasets themselves in the context of supervised learning tasks.
Hardware Specification	Yes	All of our computations were conducted using 64-bit MATLAB R2012a under Ubuntu on a 2.6 GHz quad-core Intel i7 machine with 6Gb of RAM.
Software Dependencies	Yes	All of our computations were conducted using 64-bit MATLAB R2012a under Ubuntu on a 2.6 GHz quad-core Intel i7 machine with 6Gb of RAM.
Experiment Setup	Yes	The min/mean/max ratios were computed using 30 trials for each combination of ℓ and sketching method... The observed errors were taken to be the average errors over 30 runs of the SPSD approximation algorithms... We consider three regimes for ℓ, the number of column samples used to construct the sketch: ℓ= O(k), ℓ= O(k ln k), and ℓ= O(k ln n)... The power scheme is a version of Algorithm 3 where r = k and q is determined by monitoring the convergence of the leverage scores of A2q+1Π and terminating when the change in the leverage scores between iterations, as measured in the infinity norm, is smaller than 10 2... The parameters in Algorithm 1 were taken to be r1 = ϵ 2 ln(dδ 1)( ln(nδ 1))2 and r2 = ϵ 2(ln n + ln δ 1) with ϵ = 1 and δ = 1/10.