Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Low Precision Streaming PCA

Authors: Sanjoy Dasgupta, Syamantak Kumar, Shourya Pandey, Purnamrita Sarkar

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations on synthetic streams validate our theoretical findings and demonstrate that our low-precision methods closely track the performance of standard Oja s algorithm. 5 Experiments
Researcher Affiliation	Academia	Sanjoy Dasgupta University of California San Diego EMAIL Syamantak Kumar University of Texas at Austin EMAIL Shourya Pandey University of Texas at Austin EMAIL Purnamrita Sarkar University of Texas at Austin EMAIL
Pseudocode	Yes	Algorithm 1 Quantized Oja s Algorithm with Batches
Open Source Code	Yes	We will submit the code with the supplementary material.
Open Datasets	Yes	Empirical evaluations on synthetic streams validate our theoretical findings... We use the MNIST dataset [LBBH98] of images of handwritten digits (0 through 9)... The Human Activity Recognition (HAR) Dataset [AGO 13] contains smartphone sensor readings...
Dataset Splits	No	The batched variant follows Eq 2 with b 100 (for Figures 2a and 2b) and b 25 (for Figure 2c) equal-sized batches. Algorithm 2 begins by partitioning m data t Xiui Prms into r Θplog 1{θq disjoint batches of size n each and runs the algorithm A on each batch. The paper discusses batching for the streaming algorithm, not explicit training/test/validation splits of the dataset for evaluation.
Hardware Specification	Yes	All experiments were done on a personal computer with a single CPU.
Software Dependencies	No	The paper does not specify any particular software libraries, frameworks, or their version numbers used for the experiments. It only generally mentions C++, Python, and MATLAB as languages where logarithmic schemes are used, not as specific dependencies for their implementation.
Experiment Setup	Yes	We set the learning rate to η 2 lnpnq n pλ1 λ2q for the standard method and to η 2 lnpnq b pλ1 λ2q for the batched methods. Every trial begins from a random Gaussian vector normalized to unit length. Each configuration is run for R 100 independent trials. In Experiment 1 we fix d 100 and vary n P t1000, 2000, 3000, 4000, 5000u; in Experiment 2 we fix n 5000 and vary d P t100, 200, 300, 400, 500u.