Cross-validation Confidence Intervals for Test Error
Authors: Pierre Bayle, Alexandre Bayle, Lucas Janson, Lester Mackey
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our real-data experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature. |
| Researcher Affiliation | Collaboration | Pierre Bayle Princeton University pbayle@princeton.edu Alexandre Bayle Harvard University alexandre bayle@g.harvard.edu Lucas Janson Harvard University ljanson@fas.harvard.edu Lester Mackey Microsoft Research New England lmackey@microsoft.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Complete experimental details are available in App. K.1, and code replicating all experiments can be found at https://github.com/alexandre-bayle/cvci. |
| Open Datasets | Yes | We use the Higgs dataset of [6, 7] to study the classification error... and the Kaggle Flight Delays dataset of [1] to study the mean-squared regression error... |
| Dataset Splits | Yes | We fix k = 10, use 90-10 train-validation splits for all tests save 5 2-fold CV, and report our results using ˆσ2 n,out (as ˆσ2 n,in results are nearly identical). |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models or memory used for running experiments. |
| Software Dependencies | No | The paper mentions machine learning libraries like Scikit-learn (in references) and various algorithms but does not specify exact version numbers for any software dependencies. |
| Experiment Setup | Yes | We fix k = 10, use 90-10 train-validation splits for all tests save 5 2-fold CV... Complete experimental details are available in App. K.1... For random forest, we used RandomForestClassifier and RandomForestRegressor from scikit-learn [44] with max_depth=6 for Higgs and max_depth=10 for Flight Delays, and n_estimators=100. For neural network classification, we used a three-layer neural network with 100 units per layer, ReLU activation, Adam optimizer (with learning rate 10 3), and batch size 64. For ℓ2-penalized logistic regression classification and ridge regression, we used LogisticRegression and Ridge from scikit-learn [44], respectively, with penalty parameter α = 1. |