Algorithms for $\ell_p$ Low-Rank Approximation
Authors: Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David P. Woodruff
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test the performance of our algorithms, for p = 1 and p = , on real and synthetic data and show that they produce low-rank approximations that are substantially better than what the SVD (i.e., p = 2) would obtain. In this section, we show the effectiveness of Algorithm 2 compared to the SVD. We run our comparison both on synthetic as well as real data sets. |
| Researcher Affiliation | Collaboration | 1Sapienza University, Rome, Italy. Work done in part while visiting Google. Supported in part by a Google Focused Research Award, by the ERC Starting Grant DMAP 680153, and by the SIR Grant RBSI14Q743. 2Google, Mountain View, CA 3Google, Zurich, Switzerland 4IBM Almaden, San Jose, CA. |
| Pseudocode | Yes | Algorithm 1 Enumerating and selecting k columns of A. Algorithm 2 A (k + 1)-approximation to k-LRAp. Algorithm 3 Selecting O(k log m) columns of A. Algorithm 4 An algorithm that transforms an O(k log m)-rank matrix decomposition into a k-rank matrix decomposition without inflating the error too much. |
| Open Source Code | No | No concrete access to source code for the described methodology was found. The paper references a technical report for a full version but does not state that code is provided. |
| Open Datasets | Yes | For the real data sets, we use matrices from the FIDAP set2 and a word frequency dataset from UC Irvine 3. 2ihttp://math.nist.gov/Matrix Market/data/ SPARSKIT/fidap/fidap005.html 3https://archive.ics.uci.edu/ml/datasets/ Bag+of+Words |
| Dataset Splits | No | The paper uses real and synthetic datasets but does not explicitly provide information on training, validation, or test dataset splits (e.g., specific percentages or sample counts). It describes its experimental procedure involving sampling columns, but not data partitioning. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | No specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment were provided. |
| Experiment Setup | No | The paper describes the general approach for running experiments, such as 'repeatedly sample k columns, a few thousand times, uniformly at random', but does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings. |