Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Algorithms for $\ell_p$ Low-Rank Approximation
Authors: Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David P. Woodruff
ICML 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test the performance of our algorithms, for p = 1 and p = , on real and synthetic data and show that they produce low-rank approximations that are substantially better than what the SVD (i.e., p = 2) would obtain. In this section, we show the effectiveness of Algorithm 2 compared to the SVD. We run our comparison both on synthetic as well as real data sets. |
| Researcher Affiliation | Collaboration | 1Sapienza University, Rome, Italy. Work done in part while visiting Google. Supported in part by a Google Focused Research Award, by the ERC Starting Grant DMAP 680153, and by the SIR Grant RBSI14Q743. 2Google, Mountain View, CA 3Google, Zurich, Switzerland 4IBM Almaden, San Jose, CA. |
| Pseudocode | Yes | Algorithm 1 Enumerating and selecting k columns of A. Algorithm 2 A (k + 1)-approximation to k-LRAp. Algorithm 3 Selecting O(k log m) columns of A. Algorithm 4 An algorithm that transforms an O(k log m)-rank matrix decomposition into a k-rank matrix decomposition without in๏ฌating the error too much. |
| Open Source Code | No | No concrete access to source code for the described methodology was found. The paper references a technical report for a full version but does not state that code is provided. |
| Open Datasets | Yes | For the real data sets, we use matrices from the FIDAP set2 and a word frequency dataset from UC Irvine 3. 2ihttp://math.nist.gov/Matrix Market/data/ SPARSKIT/fidap/fidap005.html 3https://archive.ics.uci.edu/ml/datasets/ Bag+of+Words |
| Dataset Splits | No | The paper uses real and synthetic datasets but does not explicitly provide information on training, validation, or test dataset splits (e.g., specific percentages or sample counts). It describes its experimental procedure involving sampling columns, but not data partitioning. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | No specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment were provided. |
| Experiment Setup | No | The paper describes the general approach for running experiments, such as 'repeatedly sample k columns, a few thousand times, uniformly at random', but does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings. |