Principal Component Projection Without Principal Component Analysis
Authors: Roy Frostig, Cameron Musco, Christopher Musco, Aaron Sidford
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude with an empirical evaluation of PC-PROC and RIDGE-PCR (Algorithms 1 and 2). Since PCR has already been justified as a statistical technique, we focus on showing that, with few iterations, our algorithm recovers an accurate approximation to Aλb and PAλy. We begin with synthetic data, which lets us control the spectral gap γ that dominates our iteration bounds (see Theorem 3.2). Data is generated randomly... As apparent in Figure 2, our algorithm performs very well for regression, even for small γ. |
| Researcher Affiliation | Collaboration | Roy Frostig RF@CS.STANFORD.EDU Stanford University Cameron Musco CNMUSCO@MIT.EDU Christopher Musco CPMUSCO@MIT.EDU MIT Aaron Sidford ASID@MICROSOFT.COM Microsoft Research, New England |
| Pseudocode | Yes | Algorithm 1 (PC-PROJ) Principal component projection; Algorithm 2 (RIDGE-PCR) Ridge regression-based PCR |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | Finally, we consider a 60K-point regression problem constructed from MNIST classification data (Le Cun et al., 2015). |
| Dataset Splits | No | The paper mentions 'synthetic data' and 'MNIST classification data' but does not specify how these datasets were split into training, validation, or test sets for experiments. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions) used for the experiments. |
| Experiment Setup | Yes | Data is generated randomly by drawing top singular values uniformly from the range [.5(1 + γ), 1] and tail singular values from [0, .5(1 γ)]. λ is set to .5 and A (500 rows, 200 columns) is formed via the SVD U VT where U and V are random bases and contains our random singular values. ... The MNIST principal component regression was run with λ = .01σ2 1. |