Algorithms for $\ell_p$ Low-Rank Approximation

Authors: Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David P. Woodruff

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test the performance of our algorithms, for p = 1 and p = , on real and synthetic data and show that they produce low-rank approximations that are substantially better than what the SVD (i.e., p = 2) would obtain. In this section, we show the effectiveness of Algorithm 2 compared to the SVD. We run our comparison both on synthetic as well as real data sets.
Researcher Affiliation Collaboration 1Sapienza University, Rome, Italy. Work done in part while visiting Google. Supported in part by a Google Focused Research Award, by the ERC Starting Grant DMAP 680153, and by the SIR Grant RBSI14Q743. 2Google, Mountain View, CA 3Google, Zurich, Switzerland 4IBM Almaden, San Jose, CA.
Pseudocode Yes Algorithm 1 Enumerating and selecting k columns of A. Algorithm 2 A (k + 1)-approximation to k-LRAp. Algorithm 3 Selecting O(k log m) columns of A. Algorithm 4 An algorithm that transforms an O(k log m)-rank matrix decomposition into a k-rank matrix decomposition without inflating the error too much.
Open Source Code No No concrete access to source code for the described methodology was found. The paper references a technical report for a full version but does not state that code is provided.
Open Datasets Yes For the real data sets, we use matrices from the FIDAP set2 and a word frequency dataset from UC Irvine 3. 2ihttp://math.nist.gov/Matrix Market/data/ SPARSKIT/fidap/fidap005.html 3https://archive.ics.uci.edu/ml/datasets/ Bag+of+Words
Dataset Splits No The paper uses real and synthetic datasets but does not explicitly provide information on training, validation, or test dataset splits (e.g., specific percentages or sample counts). It describes its experimental procedure involving sampling columns, but not data partitioning.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were mentioned in the paper.
Software Dependencies No No specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment were provided.
Experiment Setup No The paper describes the general approach for running experiments, such as 'repeatedly sample k columns, a few thousand times, uniformly at random', but does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings.