reproducibilityindex.ai

On Robustness of Principal Component Regression

Authors: Anish Agarwal, Devavrat Shah, Dennis Shen, Dogyoon Song

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	As the main contribution of this work, we address this challenge by rigorously establishing that PCR is robust to noisy, sparse, and possibly mixed valued covariates. Speciﬁcally, under PCR, vanishing prediction error is achieved with the number of samples scaling as r max(σ2, ρ 4 log5(p)), where ρ denotes the fraction of observed (noisy) covariates. We establish generalization error bounds on the performance of PCR, which provides a systematic approach in selecting the correct number of components r in a data-driven manner. The key to our result is a simple, but powerful equivalence between (i) PCR and (ii) Linear Regression with covariate pre-processing via Hard Singular Value Thresholding (HSVT).
Researcher Affiliation	Academia	The paper lists '33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.' but does not explicitly provide author affiliations (university names, company names, or email domains) within the provided text. Given that NeurIPS is a major academic conference, an academic affiliation is the most common and reasonable inference.
Pseudocode	No	The paper describes algorithms such as Principal Component Regression (PCR) and Linear Regression with Hard Singular Value Thresholding (HSVT) in text, but it does not include any structured pseudocode or algorithm blocks (e.g., labeled 'Pseudocode' or 'Algorithm X').
Open Source Code	No	The paper does not contain any explicit statement about releasing open-source code, nor does it provide any links to a code repository.
Open Datasets	No	The paper describes a theoretical model for data generation (e.g., 'we are given access to a labeled dataset {(Yi, Ai, )}'), but it does not explicitly name or provide access information (like links, DOIs, or citations to established public datasets) for any specific dataset used for experiments.
Dataset Splits	No	The paper defines notions of training and testing error for its theoretical analysis but does not provide specific details on empirical dataset splits (e.g., 80/10/10 split, cross-validation setup, or specific sample counts for training, validation, or test sets).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU models, GPU models, memory, or cloud instances) used for running experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, or specific solver versions) that would be needed to replicate experiments.
Experiment Setup	No	The paper is theoretical and does not describe any empirical experiments, therefore it does not provide specific experimental setup details such as hyperparameter values, model initialization, or training schedules.