reproducibilityindex.ai

Fast and Memory Optimal Low-Rank Matrix Approximation

Authors: Se-Young Yun, marc lelarge, Alexandre Proutiere

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we revisit the problem of constructing a near-optimal rank k approximation of a matrix M [0, 1]m n under the streaming data model where the columns of M are revealed sequentially. We present SLA (Streaming Low-rank Approximation), an algorithm that is asymptotically accurate, when ksk+1(M) = o( mn) where sk+1(M) is the (k + 1)-th largest singular value of M. This means that its average mean-square error converges to 0 as m and n grow large (i.e., ˆ M (k) M (k) 2 F = o(mn) with high probability, where ˆ M (k) and M (k) denote the output of SLA and the optimal rank k approximation of M, respectively). Our algorithm makes one pass on the data if the columns of M are revealed in a random order, and two passes if the columns of M arrive in an arbitrary order. To reduce its memory footprint and complexity, SLA uses random sparsiﬁcation, and samples each entry of M with a small probability δ. In turn, SLA is memory optimal as its required memory space scales as k(m+n), the dimension of its output. Furthermore, SLA is computationally efﬁcient as it runs in O(δkmn) time (a constant number of operations is made for each observed entry of M), which can be as small as O(k log(m)4n) for an appropriate choice of δ and if n m. The analysis of the performance of SLA is presented in Theorems 7, and 8.
Researcher Affiliation	Collaboration	Se-Young Yun MSR, Cambridge seyoung.yun@inria.fr Marc Lelarge Inria & ENS marc.lelarge@ens.fr Alexandre Proutiere KTH, EE School / ACL alepro@kth.se
Pseudocode	Yes	Algorithm 1 Streaming Low-rank Approximation (SLA), Algorithm 2 Spectral PCA (SPCA)
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the methodology described in the paper is publicly available.
Open Datasets	No	The paper presents a theoretical algorithm and its analysis, operating on a generic matrix M. It does not refer to any specific dataset or provide access information for one.
Dataset Splits	No	The paper is theoretical and does not describe any empirical validation process involving dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup or specific hardware used for computations.
Software Dependencies	No	The paper describes an algorithm and its theoretical properties but does not specify any software dependencies or version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with specific details such as hyperparameters or training configurations.