Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Input-Sparsity Low Rank Approximation in Schatten Norm
Authors: Yi Li, David Woodruff
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments The contribution of our work is primarily theoretical: an input sparsity time algorithm for low-rank approximation for any Schatten p-norm. In this section, nevertheless, we give an empirical verification of the advantage of our algorithm on both synthetic and real-world data. We focus on the most important case of the nuclear norm, i.e., p = 1. ... In Table 1 we report the median of ε1, the median of ε2, the median running time of both algorithms, among 50 independent runs for each k, and the median running time of a full SVD of A among 10 runs. |
| Researcher Affiliation | Academia | 1School of Physical and Mathematical Sciences, Nanyang Technological University 2Department of Computer Science, Carnegie Mellon University. |
| Pseudocode | Yes | Algorithm 1 Outline of the algorithm for finding a low-rank approximation |
| Open Source Code | No | The paper does not provide any links or statements about releasing the source code for their described methodology. It mentions using MATLAB but does not offer access to their implementation. |
| Open Datasets | Yes | KOS data. For real-world data, we use a word frequency dataset, named KOS, from UC Irvine.4 The matrix represents word frequencies in blogs and has dimension 3430 6906 with 353160 non-zero entries. (footnote 4 points to https://archive.ics.uci.edu/ml/datasets/Bag+of+Words) |
| Dataset Splits | No | The paper describes generating synthetic data and running the algorithm multiple times, but it does not specify explicit training, validation, or test dataset splits, nor does it mention cross-validation or predefined benchmark splits. |
| Hardware Specification | Yes | 3All tests are run under MATLAB 2019b on a machine of Intel Core i7-6550U CPU@2.20GHz with 2 cores. |
| Software Dependencies | Yes | 3All tests are run under MATLAB 2019b on a machine of Intel Core i7-6550U CPU@2.20GHz with 2 cores. |
| Experiment Setup | Yes | We adopt a simpler version of Algorithm 1 by taking S to be a COUNT-SKETCH matrix of target dimension k2 and both R and T to be identity matrices of appropriate dimension. For the Frobenius-norm solution, we also take S to be a COUNT-SKETCH matrix of target dimension k2. We choose n = 3000 and generate a random n n matrix A of independent entries, each of which is uniform in [0, 1] with probability 0.05 and 0 with probability 0.95. Since the regime of interest is k n, we vary k among {5, 10, 20}. |