Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Low Precision Streaming PCA
Authors: Sanjoy Dasgupta, Syamantak Kumar, Shourya Pandey, Purnamrita Sarkar
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on synthetic streams validate our theoretical findings and demonstrate that our low-precision methods closely track the performance of standard Oja s algorithm. 5 Experiments |
| Researcher Affiliation | Academia | Sanjoy Dasgupta University of California San Diego EMAIL Syamantak Kumar University of Texas at Austin EMAIL Shourya Pandey University of Texas at Austin EMAIL Purnamrita Sarkar University of Texas at Austin EMAIL |
| Pseudocode | Yes | Algorithm 1 Quantized Oja s Algorithm with Batches |
| Open Source Code | Yes | We will submit the code with the supplementary material. |
| Open Datasets | Yes | Empirical evaluations on synthetic streams validate our theoretical findings... We use the MNIST dataset [LBBH98] of images of handwritten digits (0 through 9)... The Human Activity Recognition (HAR) Dataset [AGO 13] contains smartphone sensor readings... |
| Dataset Splits | No | The batched variant follows Eq 2 with b 100 (for Figures 2a and 2b) and b 25 (for Figure 2c) equal-sized batches. Algorithm 2 begins by partitioning m data t Xiui Prms into r Θplog 1{θq disjoint batches of size n each and runs the algorithm A on each batch. The paper discusses batching for the streaming algorithm, not explicit training/test/validation splits of the dataset for evaluation. |
| Hardware Specification | Yes | All experiments were done on a personal computer with a single CPU. |
| Software Dependencies | No | The paper does not specify any particular software libraries, frameworks, or their version numbers used for the experiments. It only generally mentions C++, Python, and MATLAB as languages where logarithmic schemes are used, not as specific dependencies for their implementation. |
| Experiment Setup | Yes | We set the learning rate to η 2 lnpnq n pλ1 λ2q for the standard method and to η 2 lnpnq b pλ1 λ2q for the batched methods. Every trial begins from a random Gaussian vector normalized to unit length. Each configuration is run for R 100 independent trials. In Experiment 1 we fix d 100 and vary n P t1000, 2000, 3000, 4000, 5000u; in Experiment 2 we fix n 5000 and vary d P t100, 200, 300, 400, 500u. |