Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression

Authors: Will Stephenson, Zachary Frangella, Madeleine Udell, Tamara Broderick

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically confirm our theory using simulated experiments.
Researcher Affiliation Academia William T. Stephenson MIT wtstephe@mit.edu Zachary Frangella Cornell zjf4@cornell.edu Madeleine Udell Cornell udell@cornell.edu Tamara Broderick MIT tbroderick@mit.edu
Pseudocode No The paper describes conceptual steps and refers to optimization algorithms but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes Our first dataset contains N = 2,938 observations of life expectancy, along with D = 20 covariates such as country of origin or alcohol use; see ?? for a full description. ... Our second dataset consists of recorded wine quality of N = 1,599 red wines. The goal is to predict wine quality from D = 11 observed covariates relating to the chemical properties of each wine; see ?? for a full description.
Dataset Splits Yes Here we study the leave-one-out CV (LOOCV) loss
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as CPU or GPU models.
Software Dependencies No The only software dependency for our experiments is Num Py [Harris et al., 2020], which uses the BSD 3-Clause New or Revised License. (This does not include a specific version number for NumPy itself, only the publication year of the paper describing it.)
Experiment Setup Yes We fix D = 5. To generate various spectra of X, we set Sd = eαd/eαD. For each α, we sample 100 left-singular-value matrices U from the uniform distribution... We fix a unit-norm θ RD and for each U, we generate data from a well-specified linear model, yn = xn, θ + εn, where the εn are drawn i.i.d. from N(0, σ2) with variance σ2 = 0.5. In particular, for each setting of U, we generate 100 vectors Y. For each setting of U and Y, we compute L and check whether it is quasiconvex.