Regularized Weighted Low Rank Approximation

Authors: Frank Ban, David Woodruff, Richard Zhang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The goal of our experiments was to show that sketching down to the statistical dimension can be applied to regularized weighted low rank approximation without sacrificing overall accuracy in the objective function, as our theory predicts. We combine sketching with a common practical alternating minimization heuristic for solving regularized weighted low rank approximation, rather than implementing a polynomial system solver. At each step in the algorithm, we have a candidate U and V and we perform a best-response where we either update U to give the best regularized weighted low rank approximation cost for V or we update V to give the best regularized weighted low rank approximation cost for U. We used a synthetic dataset and several real datasets (connectus, NIPS, landmark, and language) [DH11, PJST17]. All our experiments ran on a Mac Book Pro 2012 with 8GB RAM and a 2.5GHz Intel Core i5 processor. Figure 1: Regularized weighted low-rank approximations with λ = 0.556 for landmark, λ = 314 for NIPS, and λ = 1 for the synthetic dataset.
Researcher Affiliation Collaboration Frank Ban UC Berkeley / Google fban@berkeley.edu David Woodruff Carnegie Mellon University dwoodruf@cs.cmu.edu Qiuyi (Richard) Zhang UC Berkeley / Google qiuyi@berkeley.edu
Pseudocode Yes Algorithm 1 Regularized Weighted Low Rank Approximation public : procedure REGWEIGHTEDLOWRANK(A, W, λ, s, k, ") Sample Gaussian sketch S0 2 ") log(1/") from D Sample Gaussian sketch S00 2 ") log(1/") n from D. Create matrix variables P (i) 2 ") log(1/"), Q(j) 2 ") log(1/") for i, j from 1 to r . Variables used in polynomial system solver Use Cramer s Rule to express Ui,; = Ai,;DWi,;S0(P (i))T (P (i)(P (i))T + λIk) 1 as a rational function of variables P (i); similarly, V;,j = ((Q(j))T Q(j) + λIk) 1(Q(j))T S00DW;,j A;,j . U, V are now rational function of variables P, Q Solve min U, V k W ( U V A)k2 F + λk V k2 F and apply binary search to find U, V . Optimization with polynomial solver of Theorem 1 in variables P, Q return U, V
Open Source Code No The paper does not provide explicit links or statements about making the source code for their methodology available.
Open Datasets Yes We used a synthetic dataset and several real datasets (connectus, NIPS, landmark, and language) [DH11, PJST17].
Dataset Splits No The paper does not specify how the datasets were split into training, validation, and testing sets or provide split percentages/counts.
Hardware Specification Yes All our experiments ran on a Mac Book Pro 2012 with 8GB RAM and a 2.5GHz Intel Core i5 processor.
Software Dependencies No We used the built-in svd function in numpy s linear algebra package. The paper mentions 'numpy' but does not specify its version number or any other software dependencies with versions.
Experiment Setup Yes For all datasets, the task was to find a rank k = 50 decomposition of a given matrix A. For the experiments of Figure 1 and Figure 2, we generated dense weight matrices W with the same shape as A and with each entry being a 1 with probability 0.8, a 0.1 with probability 0.15, and a 0.01 with probability 0.05. For the experiments of Figure 3, we generated binary weight matrices where each entry was 1 with probability 0.9. Note that this setting corresponds to a regularized form of matrix completion. We set the regularization parameter λ to a variety of values (described in the Figure captions) to illustrate the performance of the algorithm in different settings. ... We parameterized the experiments by t, the sketch size, which took values in {10, 15, 20, 25, 30, 35, 40, 45, 50}. ... For Alternating Minimization without Sketching, we initialized the low rank matrix factors U and V to be random subsets of the rows and columns of A respectively, then performed n = 25 steps of alternating minimization. For Alternating Minimization with Sketching, we initialized U and V the same way, but performed n = 25 best response updates in the sketched space, as in Theorem 3.