Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Oja's Algorithm for Streaming Sparse PCA
Authors: Syamantak Kumar, Purnamrita Sarkar
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1: Comparison of Sparse PCA algorithms for identifying leading eigenvector, v1, operating in O (d) space and O (nd) time with population covariance matrix specified in [QLR19], Section 5.1. Figure (a) plots [JL09] (Purple), [YX15] (Black), [WL16] (Orange) and our proposed Algorithm 2 (Blue) for n = d = 1000, with error bars over 100 random runs. Figure (b) shows an image of the covariance matrix with n = d = 100. Figure 2: We use Σ used in [QLR19], Section 5.1. (a) Variation of log |e i Bnu0| for i S and i / S (y-axis) with n (x-axis) for a fixed unit vector u0. η is set as Theorem 3.5 and n grows from 1 to 1000. The lines labelled sample plot log(|e i Bnu0|), whereas the population curves plot log(|E e i Bnu0 |). (b) Variation of log Bn BT n and log v T 1 Bn BT n v1 (y-axis) with n [300] (x-axis). We also plot log of the bound of Bn BT n as in [JJK+16] and 2n log (1 + ηλ1) for comparison. |
| Researcher Affiliation | Academia | Syamantak Kumar1 Purnamrita Sarkar2 1Department of Computer Science, UT Austin 2Department of Statistics and Data Sciences, UT Austin EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Oja Support Recovery {Xi}i [n] , k, η 1: Input : Dataset {Xi}i [n], Cardinality parameter k s, learning rate η > 0 2: u0 N (0, I) 3: ˆv Oja {Xi}i [n] , η, y0 4: b S Indices of k largest values of |ˆv| 5: return b S |
| Open Source Code | Yes | We make the code for our experiments available in the supplementary material. |
| Open Datasets | Yes | Given a dataset {Xi}i=1,...,n where Xi Rd, sampled independently from a distribution with mean zero and covariance matrix Σ, the goal in PCA is to find the directions that explain most of the variance in the data. Figure 1: Comparison of Sparse PCA algorithms for identifying leading eigenvector, v1, operating in O (d) space and O (nd) time with population covariance matrix specified in [QLR19], Section 5.1. |
| Dataset Splits | Yes | For the results in this section, we split the dataset D := {Xi}i [n] into two halves and estimate the support using the first half as b S Oja Support Recovery {Xi}i [ n 2 ] , k, η := 3 log(n) n(λ1 λ2) and input, k s. The second half of the samples are then used to compute the estimated sparse eigenvector. |
| Hardware Specification | Yes | Our experiments were performed on a single Macbook Pro M2 2022 CPU with 8 GB RAM. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers (e.g., programming languages, libraries, frameworks) required to reproduce the experiments. |
| Experiment Setup | Yes | Oja s algorithm with constant learning rate. With a constant learning rate, η, and initial vector, u0, Oja s algorithm [Oja82a], denoted as Oja {Xt}t [n] , η, u0 , performs the updates, ut (I + ηXt XT t )ut 1, ut ut ut 2 . For convenience of analysis, we also define t [n], Bt := I + ηXt XT t I + ηXt 1XT t 1 I + ηX1XT 1 , B0 = I (3) ... 2: u0 N (0, I) ... where η := 3 log(n) n(λ1 λ2). |