Approximate Cross-Validation with Low-Rank Data in High Dimensions
Authors: Will Stephenson, Madeleine Udell, Tamara Broderick
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present numerical experiments that confirm our theoretical predictions and demonstrate the effectiveness of ACV in a range of high-dimensional settings. |
| Researcher Affiliation | Academia | 1 Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213 2 Department of Mathematics and Statistics, McMaster University, Hamilton, ON L8S 4K1, Canada 3 Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213 |
| Pseudocode | No | The paper describes methods in prose and mathematical formulations but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We also consider publicly available scRNA-seq data from [30]. |
| Dataset Splits | Yes | For each simulation, we generate N = 1000 training samples and Ntest = 100 test samples. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper states "All numerical experiments were performed using Python 3.9." but does not list specific version numbers for other key libraries or dependencies like scikit-learn that are mentioned. Python 3.9 by itself is not sufficient according to the criteria. |
| Experiment Setup | Yes | We consider two types of synthetic data: Ridge regression and Logistic regression, both with a low-rank feature matrix. For each simulation, we generate N = 1000 training samples and Ntest = 100 test samples, with p covariates (p = 200 or p = 2000) and rank r (r = 1, 5, 10, 20). The noise level is set to σ = 0.1, 0.5, 1.0. For the iterative solver, we set the maximum number of iterations max_iter = 1000 and tolerance tol = 1e-6. The regularization parameter λ is chosen using 5-fold CV on the training data. |