Oja's Algorithm for Streaming Sparse PCA
Authors: Syamantak Kumar, Purnamrita Sarkar
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1: Comparison of Sparse PCA algorithms for identifying leading eigenvector, v1, operating in O (d) space and O (nd) time with population covariance matrix specified in [QLR19], Section 5.1. Figure (a) plots [JL09] (Purple), [YX15] (Black), [WL16] (Orange) and our proposed Algorithm 2 (Blue) for n = d = 1000, with error bars over 100 random runs. Figure (b) shows an image of the covariance matrix with n = d = 100. Figure 2: We use Σ used in [QLR19], Section 5.1. (a) Variation of log |e i Bnu0| for i S and i / S (y-axis) with n (x-axis) for a fixed unit vector u0. η is set as Theorem 3.5 and n grows from 1 to 1000. The lines labelled sample plot log(|e i Bnu0|), whereas the population curves plot log(|E e i Bnu0 |). (b) Variation of log Bn BT n and log v T 1 Bn BT n v1 (y-axis) with n [300] (x-axis). We also plot log of the bound of Bn BT n as in [JJK+16] and 2n log (1 + ηλ1) for comparison. |
| Researcher Affiliation | Academia | Syamantak Kumar1 Purnamrita Sarkar2 1Department of Computer Science, UT Austin 2Department of Statistics and Data Sciences, UT Austin syamantak@utexas.edu, purna.sarkar@austin.utexas.edu |
| Pseudocode | Yes | Algorithm 1 Oja Support Recovery {Xi}i [n] , k, η 1: Input : Dataset {Xi}i [n], Cardinality parameter k s, learning rate η > 0 2: u0 N (0, I) 3: ˆv Oja {Xi}i [n] , η, y0 4: b S Indices of k largest values of |ˆv| 5: return b S |
| Open Source Code | Yes | We make the code for our experiments available in the supplementary material. |
| Open Datasets | Yes | Given a dataset {Xi}i=1,...,n where Xi Rd, sampled independently from a distribution with mean zero and covariance matrix Σ, the goal in PCA is to find the directions that explain most of the variance in the data. Figure 1: Comparison of Sparse PCA algorithms for identifying leading eigenvector, v1, operating in O (d) space and O (nd) time with population covariance matrix specified in [QLR19], Section 5.1. |
| Dataset Splits | Yes | For the results in this section, we split the dataset D := {Xi}i [n] into two halves and estimate the support using the first half as b S Oja Support Recovery {Xi}i [ n 2 ] , k, η := 3 log(n) n(λ1 λ2) and input, k s. The second half of the samples are then used to compute the estimated sparse eigenvector. |
| Hardware Specification | Yes | Our experiments were performed on a single Macbook Pro M2 2022 CPU with 8 GB RAM. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers (e.g., programming languages, libraries, frameworks) required to reproduce the experiments. |
| Experiment Setup | Yes | Oja s algorithm with constant learning rate. With a constant learning rate, η, and initial vector, u0, Oja s algorithm [Oja82a], denoted as Oja {Xt}t [n] , η, u0 , performs the updates, ut (I + ηXt XT t )ut 1, ut ut ut 2 . For convenience of analysis, we also define t [n], Bt := I + ηXt XT t I + ηXt 1XT t 1 I + ηX1XT 1 , B0 = I (3) ... 2: u0 N (0, I) ... where η := 3 log(n) n(λ1 λ2). |