reproducibilityindex.ai

Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss

Authors: Zhao Song, David Woodruff, Peilin Zhong

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	3 Experiments. We tested Algorithm 2 on both synthetic and real datasets. For each rank-k experiment, we chose a high rank matrix b A Rn d, applied top-k SVD to b A and obtained a rank-k matrix A as our ground truth matrix. For our synthetic data experiments, the matrix b A R500 500 was generated at random, where each entry was drawn uniformly from {0, 1, , 9}. For real datasets, we chose isolet3 (617 1559) or mfeat4 (651 2000) as b A [29]. We tested two different noise distributions. One distribution is the standard Lévy 1.1-stable distribution [30]. Another distribution is constructed from the standard Cauchy distribution, i.e., to draw a sample from the constructed distribution, we draw a sample from the Cauchy distribution, keep the sign unchanged, and take the 1 1.1-th power of the absolute value. ... Figure 1: Empirical results.
Researcher Affiliation	Academia	Zhao Song University of Washington magic.linuxkde@gmail.com David P. Woodruff Carnegie Mellon University dwoodruf@cs.cmu.edu Peilin Zhong Columbia University pz2225@columbia.edu
Pseudocode	Yes	Algorithm 1 ℓ1-Low Rank Approximation with Input Assumption. Algorithm 2 Median Heuristic.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	For real datasets, we chose isolet3 (617 1559) or mfeat4 (651 2000) as b A [29]. 3https://archive.ics.uci.edu/ml/datasets/isolet 4https://archive.ics.uci.edu/ml/datasets/Multiple+Features
Dataset Splits	No	The paper uses datasets but does not provide specific details on training, validation, or test splits or their proportions.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers.
Experiment Setup	Yes	For Algorithm 2, we set s = min(50, n/k ). For all of algorithms we repeated the experiment the same number of times and compared the best solution obtained by each algorithm. For our synthetic data experiments, the matrix b A R500 500 was generated at random, where each entry was drawn uniformly from {0, 1, , 9}. ... To construct the noise matrix Rn d, we drew a matrix b where each entry is an i.i.d. sample from one of the two noise distributions, and then scaled the noise: = b A 1 20 n d.