Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss

Authors: Zhao Song, David Woodruff, Peilin Zhong

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3 Experiments. We tested Algorithm 2 on both synthetic and real datasets. For each rank-k experiment, we chose a high rank matrix b A Rn d, applied top-k SVD to b A and obtained a rank-k matrix A as our ground truth matrix. For our synthetic data experiments, the matrix b A R500 500 was generated at random, where each entry was drawn uniformly from {0, 1, , 9}. For real datasets, we chose isolet3 (617 1559) or mfeat4 (651 2000) as b A [29]. We tested two different noise distributions. One distribution is the standard Lévy 1.1-stable distribution [30]. Another distribution is constructed from the standard Cauchy distribution, i.e., to draw a sample from the constructed distribution, we draw a sample from the Cauchy distribution, keep the sign unchanged, and take the 1 1.1-th power of the absolute value. ... Figure 1: Empirical results.
Researcher Affiliation Academia Zhao Song University of Washington magic.linuxkde@gmail.com David P. Woodruff Carnegie Mellon University dwoodruf@cs.cmu.edu Peilin Zhong Columbia University pz2225@columbia.edu
Pseudocode Yes Algorithm 1 ℓ1-Low Rank Approximation with Input Assumption. Algorithm 2 Median Heuristic.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes For real datasets, we chose isolet3 (617 1559) or mfeat4 (651 2000) as b A [29]. 3https://archive.ics.uci.edu/ml/datasets/isolet 4https://archive.ics.uci.edu/ml/datasets/Multiple+Features
Dataset Splits No The paper uses datasets but does not provide specific details on training, validation, or test splits or their proportions.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments.
Software Dependencies No The paper does not specify software dependencies with version numbers.
Experiment Setup Yes For Algorithm 2, we set s = min(50, n/k ). For all of algorithms we repeated the experiment the same number of times and compared the best solution obtained by each algorithm. For our synthetic data experiments, the matrix b A R500 500 was generated at random, where each entry was drawn uniformly from {0, 1, , 9}. ... To construct the noise matrix Rn d, we drew a matrix b where each entry is an i.i.d. sample from one of the two noise distributions, and then scaled the noise: = b A 1 20 n d.