Optimal Analysis of Subset-Selection Based L_p Low-Rank Approximation

Authors: Chen Dan, Hong Wang, Hongyang Zhang, Yuchen Zhou, Pradeep K. Ravikumar

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the low rank approximation problem of any given matrix A over Rn m and Cn m in entry-wise ℓp loss, that is, finding a rank-k matrix X such that A X p is minimized. Unlike the traditional ℓ2 setting, this particular variant is NP-Hard. We show that the algorithm of column subset selection, which was an algorithmic foundation of many existing algorithms, enjoys approximation ratio (k + 1)1/p for 1 p 2 and (k + 1)1 1/p for p 2. This improves upon the previous O(k + 1) bound for p 1 [1]. We complement our analysis with lower bounds; these bounds match our upper bounds up to constant 1 when p 2. At the core of our techniques is an application of Riesz-Thorin interpolation theorem from harmonic analysis, which might be of independent interest to other algorithmic designs and analysis more broadly.
Researcher Affiliation Academia Chen Dan Carnegie Mellon University cdan@cs.cmu.edu Hong Wang Princeton University Hong.Wang1991@gmail.com Hongyang Zhang Toyota Technological Institute at Chicago honyanz@ttic.edu Yuchen Zhou University of Wisconsin, Madison yuchenzhou@stat.wisc.edu Pradeep Ravikumar Carnegie Mellon University pradeepr@cs.cmu.edu
Pseudocode Yes Algorithm 1 A cp,k approximation to problem (1) by column subset selection. Algorithm 2 [1] SELECTCOLUMNS (k, A): Selecting O(k log m) columns of A. Algorithm 3 [1]An algorithm that transforms an O(k log m)-rank matrix factorization into a k-rank matrix factorization without inflating the error too much.
Open Source Code No The paper does not provide any statement or link regarding the availability of open-source code for the methodology described.
Open Datasets No The paper is theoretical and does not conduct experiments with specific datasets. Therefore, no information about publicly available training datasets or access is provided.
Dataset Splits No The paper is theoretical and does not conduct experiments with dataset splits. Therefore, no information about validation splits is provided.
Hardware Specification No The paper is theoretical and does not describe any experiments. Therefore, no hardware specifications for running experiments are mentioned.
Software Dependencies No The paper is theoretical and does not describe any experiments requiring specific software dependencies with version numbers for reproducibility.
Experiment Setup No The paper is theoretical and does not describe any experiments or their setup. Therefore, no specific experimental setup details like hyperparameters or training configurations are provided.