On Robustness of Principal Component Regression
Authors: Anish Agarwal, Devavrat Shah, Dennis Shen, Dogyoon Song
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | As the main contribution of this work, we address this challenge by rigorously establishing that PCR is robust to noisy, sparse, and possibly mixed valued covariates. Specifically, under PCR, vanishing prediction error is achieved with the number of samples scaling as r max(σ2, ρ 4 log5(p)), where ρ denotes the fraction of observed (noisy) covariates. We establish generalization error bounds on the performance of PCR, which provides a systematic approach in selecting the correct number of components r in a data-driven manner. The key to our result is a simple, but powerful equivalence between (i) PCR and (ii) Linear Regression with covariate pre-processing via Hard Singular Value Thresholding (HSVT). |
| Researcher Affiliation | Academia | The paper lists '33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.' but does not explicitly provide author affiliations (university names, company names, or email domains) within the provided text. Given that NeurIPS is a major academic conference, an academic affiliation is the most common and reasonable inference. |
| Pseudocode | No | The paper describes algorithms such as Principal Component Regression (PCR) and Linear Regression with Hard Singular Value Thresholding (HSVT) in text, but it does not include any structured pseudocode or algorithm blocks (e.g., labeled 'Pseudocode' or 'Algorithm X'). |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code, nor does it provide any links to a code repository. |
| Open Datasets | No | The paper describes a theoretical model for data generation (e.g., 'we are given access to a labeled dataset {(Yi, Ai, )}'), but it does not explicitly name or provide access information (like links, DOIs, or citations to established public datasets) for any specific dataset used for experiments. |
| Dataset Splits | No | The paper defines notions of training and testing error for its theoretical analysis but does not provide specific details on empirical dataset splits (e.g., 80/10/10 split, cross-validation setup, or specific sample counts for training, validation, or test sets). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU models, GPU models, memory, or cloud instances) used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, or specific solver versions) that would be needed to replicate experiments. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experiments, therefore it does not provide specific experimental setup details such as hyperparameter values, model initialization, or training schedules. |