Is Cross-validation the Gold Standard to Estimate Out-of-sample Model Performance?
Authors: Garud Iyengar, Henry Lam, Tianyu Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical results demonstrate that plug-in performs indeed no worse than CV in estimating model performance across a wide range of examples. and 5 Numerical Experiments |
| Researcher Affiliation | Academia | Department of Industrial Engineering and Operations Research Columbia University New York, NY 10027 {gi10,khl2114,tw2837}@columbia.edu |
| Pseudocode | No | The paper describes methods using mathematical formulations and prose, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | More information on the data and code are available at https://github. com/wangtianyu61/CV_Gold_Standard. |
| Open Datasets | Yes | We include one real-world dataset puma32H2 with 33 features and 1,000,000 samples as a regression task. and The dataset is available at https://www.openml.org/d/1210. |
| Dataset Splits | Yes | Leave-one-out CV (LOOCV) [5, 7], which repeatedly evaluates models trained using all but one observation on the left-out observation, is a prime approach; however, it is computationally demanding as it requires model re-training for the same number of times as the sample size. Because of this, K-fold CV, which reduces the number of model re-training down to K times (where K is typically 5 10), becomes a popular substitute [34, 42]. and We run plug-in, 5-fold CV and LOOCV with nominal level 1 α = 0.9. |
| Hardware Specification | Yes | The experiments were run on a normal PC laptop with Processor 8 Core(s), Apple M1 with 16GB RAM. |
| Software Dependencies | No | The paper mentions 'scikit-learn' and 'cvxopt' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We consider the following optimization models by calling the standard scikit-learn package: (1) Ridge Regression Models, implemented through linear_model.Ridge(alpha = 1); (2) k NN, implemented through KNeighbors Regressor with nearest neighbor number being 2n2/3 ; (3) Random Forest, implemented through Random Forest Regressor with 50 subtrees and sample ratio being n 0.6. |