reproducibilityindex.ai

Oja's Algorithm for Streaming Sparse PCA

Authors: Syamantak Kumar, Purnamrita Sarkar

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 1: Comparison of Sparse PCA algorithms for identifying leading eigenvector, v1, operating in O (d) space and O (nd) time with population covariance matrix specified in [QLR19], Section 5.1. Figure (a) plots [JL09] (Purple), [YX15] (Black), [WL16] (Orange) and our proposed Algorithm 2 (Blue) for n = d = 1000, with error bars over 100 random runs. Figure (b) shows an image of the covariance matrix with n = d = 100. Figure 2: We use Σ used in [QLR19], Section 5.1. (a) Variation of log \|e i Bnu0\| for i S and i / S (y-axis) with n (x-axis) for a fixed unit vector u0. η is set as Theorem 3.5 and n grows from 1 to 1000. The lines labelled sample plot log(\|e i Bnu0\|), whereas the population curves plot log(\|E e i Bnu0 \|). (b) Variation of log Bn BT n and log v T 1 Bn BT n v1 (y-axis) with n [300] (x-axis). We also plot log of the bound of Bn BT n as in [JJK+16] and 2n log (1 + ηλ1) for comparison.
Researcher Affiliation	Academia	Syamantak Kumar1 Purnamrita Sarkar2 1Department of Computer Science, UT Austin 2Department of Statistics and Data Sciences, UT Austin syamantak@utexas.edu, purna.sarkar@austin.utexas.edu
Pseudocode	Yes	Algorithm 1 Oja Support Recovery {Xi}i [n] , k, η 1: Input : Dataset {Xi}i [n], Cardinality parameter k s, learning rate η > 0 2: u0 N (0, I) 3: ˆv Oja {Xi}i [n] , η, y0 4: b S Indices of k largest values of \|ˆv\| 5: return b S
Open Source Code	Yes	We make the code for our experiments available in the supplementary material.
Open Datasets	Yes	Given a dataset {Xi}i=1,...,n where Xi Rd, sampled independently from a distribution with mean zero and covariance matrix Σ, the goal in PCA is to find the directions that explain most of the variance in the data. Figure 1: Comparison of Sparse PCA algorithms for identifying leading eigenvector, v1, operating in O (d) space and O (nd) time with population covariance matrix specified in [QLR19], Section 5.1.
Dataset Splits	Yes	For the results in this section, we split the dataset D := {Xi}i [n] into two halves and estimate the support using the first half as b S Oja Support Recovery {Xi}i [ n 2 ] , k, η := 3 log(n) n(λ1 λ2) and input, k s. The second half of the samples are then used to compute the estimated sparse eigenvector.
Hardware Specification	Yes	Our experiments were performed on a single Macbook Pro M2 2022 CPU with 8 GB RAM.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers (e.g., programming languages, libraries, frameworks) required to reproduce the experiments.
Experiment Setup	Yes	Oja s algorithm with constant learning rate. With a constant learning rate, η, and initial vector, u0, Oja s algorithm [Oja82a], denoted as Oja {Xt}t [n] , η, u0 , performs the updates, ut (I + ηXt XT t )ut 1, ut ut ut 2 . For convenience of analysis, we also define t [n], Bt := I + ηXt XT t I + ηXt 1XT t 1 I + ηX1XT 1 , B0 = I (3) ... 2: u0 N (0, I) ... where η := 3 log(n) n(λ1 λ2).