Single Pass PCA of Matrix Products

Authors: Shanshan Wu, Srinadh Bhojanapalli, Sujay Sanghavi, Alexandros G. Dimakis

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental in addition we also provide results from an Apache Spark implementation1 that shows better computational and statistical performance on real-world and synthetic evaluation datasets.
Researcher Affiliation Academia Shanshan Wu The University of Texas at Austin shanshan@utexas.edu Srinadh Bhojanapalli Toyota Technological Institute at Chicago srinadh@ttic.edu Sujay Sanghavi The University of Texas at Austin sanghavi@mail.utexas.edu Alexandros G. Dimakis The University of Texas at Austin dimakis@austin.utexas.edu
Pseudocode Yes Algorithm 1 SMP-PCA: Streaming Matrix Product PCA
Open Source Code Yes The source code is available at [18].S. Wu, S. Bhojanapalli, S. Sanghavi, and A. Dimakis. Github repository for "single-pass pca of matrix products". https://github.com/wushanshan/Matrix Product PCA, 2016.
Open Datasets Yes We test our algorithm on synthetic datasets and three real datasets: SIFT10K [9], NIPS-BW [11], and URL-reputation [12].
Dataset Splits No The paper mentions using specific datasets (SIFT10K, NIPS-BW, URL-reputation) but does not provide explicit details on how these datasets were split into training, validation, or test sets, nor does it specify proportions or sample counts for each split.
Hardware Specification Yes using a 150GB synthetic dataset on m3.2xlarge Amazon EC2 instances6. ... 6Each machine has 8 cores, 30GB memory, and 2 80GB SSD.
Software Dependencies Yes We implement our SMP-PCA in Apache Spark 1.6.2 [19].
Experiment Setup Yes For all rest experiments, unless otherwise specified, we set r = 5, T = 10, and m as 4nr log n.