Principal Component Projection Without Principal Component Analysis

Authors: Roy Frostig, Cameron Musco, Christopher Musco, Aaron Sidford

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude with an empirical evaluation of PC-PROC and RIDGE-PCR (Algorithms 1 and 2). Since PCR has already been justified as a statistical technique, we focus on showing that, with few iterations, our algorithm recovers an accurate approximation to Aλb and PAλy. We begin with synthetic data, which lets us control the spectral gap γ that dominates our iteration bounds (see Theorem 3.2). Data is generated randomly... As apparent in Figure 2, our algorithm performs very well for regression, even for small γ.
Researcher Affiliation Collaboration Roy Frostig RF@CS.STANFORD.EDU Stanford University Cameron Musco CNMUSCO@MIT.EDU Christopher Musco CPMUSCO@MIT.EDU MIT Aaron Sidford ASID@MICROSOFT.COM Microsoft Research, New England
Pseudocode Yes Algorithm 1 (PC-PROJ) Principal component projection; Algorithm 2 (RIDGE-PCR) Ridge regression-based PCR
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes Finally, we consider a 60K-point regression problem constructed from MNIST classification data (Le Cun et al., 2015).
Dataset Splits No The paper mentions 'synthetic data' and 'MNIST classification data' but does not specify how these datasets were split into training, validation, or test sets for experiments.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions) used for the experiments.
Experiment Setup Yes Data is generated randomly by drawing top singular values uniformly from the range [.5(1 + γ), 1] and tail singular values from [0, .5(1 γ)]. λ is set to .5 and A (500 rows, 200 columns) is formed via the SVD U VT where U and V are random bases and contains our random singular values. ... The MNIST principal component regression was run with λ = .01σ2 1.