Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Principal Component Projection Without Principal Component Analysis
Authors: Roy Frostig, Cameron Musco, Christopher Musco, Aaron Sidford
ICML 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude with an empirical evaluation of PC-PROC and RIDGE-PCR (Algorithms 1 and 2). Since PCR has already been justified as a statistical technique, we focus on showing that, with few iterations, our algorithm recovers an accurate approximation to Aλb and PAλy. We begin with synthetic data, which lets us control the spectral gap γ that dominates our iteration bounds (see Theorem 3.2). Data is generated randomly... As apparent in Figure 2, our algorithm performs very well for regression, even for small γ. |
| Researcher Affiliation | Collaboration | Roy Frostig EMAIL Stanford University Cameron Musco EMAIL Christopher Musco EMAIL MIT Aaron Sidford EMAIL Microsoft Research, New England |
| Pseudocode | Yes | Algorithm 1 (PC-PROJ) Principal component projection; Algorithm 2 (RIDGE-PCR) Ridge regression-based PCR |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | Finally, we consider a 60K-point regression problem constructed from MNIST classification data (Le Cun et al., 2015). |
| Dataset Splits | No | The paper mentions 'synthetic data' and 'MNIST classification data' but does not specify how these datasets were split into training, validation, or test sets for experiments. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions) used for the experiments. |
| Experiment Setup | Yes | Data is generated randomly by drawing top singular values uniformly from the range [.5(1 + γ), 1] and tail singular values from [0, .5(1 γ)]. λ is set to .5 and A (500 rows, 200 columns) is formed via the SVD U VT where U and V are random bases and contains our random singular values. ... The MNIST principal component regression was run with λ = .01σ2 1. |