Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On the number of variables to use in principal component regression
Authors: Ji Xu, Daniel J. Hsu
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We give an average-case analysis of the out-of-sample prediction error as p, n, N ! 1 with p/N ! and n/N ! β, for some constants 2 [0, 1] and β 2 (0, 1). In this average-case setting, the prediction error exhibits a double descent shape as a function of p. We also establish conditions under which the minimum risk is achieved in the interpolating (p > n) regime. The proofs of the results are detailed in the full version of the paper [19]. |
| Researcher Affiliation | Academia | Ji Xu Columbia University EMAIL Daniel Hsu Columbia University EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statements about open-source code availability or links to repositories. |
| Open Datasets | No | The paper uses a synthetic data model ('Our data (x1, y1), . . . , (xn, yn) are assumed to be i.i.d. with xi N(0, )') for theoretical analysis and does not mention or provide access to any public datasets. |
| Dataset Splits | No | The paper is theoretical and analyzes a synthetic data model. It does not describe any specific training, validation, or test dataset splits for empirical reproduction. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper does not provide any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an empirical experimental setup with hyperparameters or system-level training settings. |