Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss
Authors: Zhao Song, David Woodruff, Peilin Zhong
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 Experiments. We tested Algorithm 2 on both synthetic and real datasets. For each rank-k experiment, we chose a high rank matrix b A Rn d, applied top-k SVD to b A and obtained a rank-k matrix A as our ground truth matrix. For our synthetic data experiments, the matrix b A R500 500 was generated at random, where each entry was drawn uniformly from {0, 1, , 9}. For real datasets, we chose isolet3 (617 1559) or mfeat4 (651 2000) as b A [29]. We tested two different noise distributions. One distribution is the standard Lévy 1.1-stable distribution [30]. Another distribution is constructed from the standard Cauchy distribution, i.e., to draw a sample from the constructed distribution, we draw a sample from the Cauchy distribution, keep the sign unchanged, and take the 1 1.1-th power of the absolute value. ... Figure 1: Empirical results. |
| Researcher Affiliation | Academia | Zhao Song University of Washington magic.linuxkde@gmail.com David P. Woodruff Carnegie Mellon University dwoodruf@cs.cmu.edu Peilin Zhong Columbia University pz2225@columbia.edu |
| Pseudocode | Yes | Algorithm 1 ℓ1-Low Rank Approximation with Input Assumption. Algorithm 2 Median Heuristic. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | For real datasets, we chose isolet3 (617 1559) or mfeat4 (651 2000) as b A [29]. 3https://archive.ics.uci.edu/ml/datasets/isolet 4https://archive.ics.uci.edu/ml/datasets/Multiple+Features |
| Dataset Splits | No | The paper uses datasets but does not provide specific details on training, validation, or test splits or their proportions. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers. |
| Experiment Setup | Yes | For Algorithm 2, we set s = min(50, n/k ). For all of algorithms we repeated the experiment the same number of times and compared the best solution obtained by each algorithm. For our synthetic data experiments, the matrix b A R500 500 was generated at random, where each entry was drawn uniformly from {0, 1, , 9}. ... To construct the noise matrix Rn d, we drew a matrix b where each entry is an i.i.d. sample from one of the two noise distributions, and then scaled the noise: = b A 1 20 n d. |