reproducibilityindex.ai

Algorithms for $\ell_p$ Low-Rank Approximation

Authors: Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David P. Woodruff

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test the performance of our algorithms, for p = 1 and p = , on real and synthetic data and show that they produce low-rank approximations that are substantially better than what the SVD (i.e., p = 2) would obtain. In this section, we show the effectiveness of Algorithm 2 compared to the SVD. We run our comparison both on synthetic as well as real data sets.
Researcher Affiliation	Collaboration	1Sapienza University, Rome, Italy. Work done in part while visiting Google. Supported in part by a Google Focused Research Award, by the ERC Starting Grant DMAP 680153, and by the SIR Grant RBSI14Q743. 2Google, Mountain View, CA 3Google, Zurich, Switzerland 4IBM Almaden, San Jose, CA.
Pseudocode	Yes	Algorithm 1 Enumerating and selecting k columns of A. Algorithm 2 A (k + 1)-approximation to k-LRAp. Algorithm 3 Selecting O(k log m) columns of A. Algorithm 4 An algorithm that transforms an O(k log m)-rank matrix decomposition into a k-rank matrix decomposition without inﬂating the error too much.
Open Source Code	No	No concrete access to source code for the described methodology was found. The paper references a technical report for a full version but does not state that code is provided.
Open Datasets	Yes	For the real data sets, we use matrices from the FIDAP set2 and a word frequency dataset from UC Irvine 3. 2ihttp://math.nist.gov/Matrix Market/data/ SPARSKIT/fidap/fidap005.html 3https://archive.ics.uci.edu/ml/datasets/ Bag+of+Words
Dataset Splits	No	The paper uses real and synthetic datasets but does not explicitly provide information on training, validation, or test dataset splits (e.g., specific percentages or sample counts). It describes its experimental procedure involving sampling columns, but not data partitioning.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were mentioned in the paper.
Software Dependencies	No	No specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment were provided.
Experiment Setup	No	The paper describes the general approach for running experiments, such as 'repeatedly sample k columns, a few thousand times, uniformly at random', but does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings.