On Optimal Interpolation in Linear Regression
Authors: Eduard Oravkin, Patrick Rebeschini
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide a closed-form expression for the interpolator that achieves this notion of optimality and show that it can be derived as the limit of preconditioned gradient descent with a specific initialization. We identify a regime where the minimum-norm interpolator provably generalizes arbitrarily worse than the optimal response-linear achievable interpolator that we introduce, and validate with numerical experiments that the notion of optimality we consider can be achieved by interpolating methods that only use the training data as input in the case of an isotropic prior. |
| Researcher Affiliation | Academia | Eduard Oravkin Department of Statistics University of Oxford eduard.oravkin@stats.ox.ac.uk Patrick Rebeschini Department of Statistics University of Oxford patrick.rebeschini@stats.ox.ac.uk |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. Methods are described mathematically and in text. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The code is in the supplementary material. |
| Open Datasets | No | In the examples, we consider a setting where xi N(0, Σ) and w N(0, r2 d Φ). Therefore, throughout Section 5 we assume Px = N(0, Σ) and Pw = N(0, r2 d Φ). |
| Dataset Splits | Yes | In regards to approximating the signal-to-noise ratio δ, we choose δe that minimizes the crossvalidated error on random subsets of the data. (...) for each δe in {0.1, 0.2, . . . , 1, 2, . . . , 10}, we computed the validation error on a random, unseen tenth of the data and averaged over 10 times. The δe with smallest crossvalidated error was chosen. |
| Hardware Specification | Yes | We used the compute-optimized c2-standard-8 with 8 CPUs and 32GB RAM on Google Cloud to obtain the figures. |
| Software Dependencies | No | In the experiments (Figures 1, 2, 3, 4, 5, 6) we used the Graphical Lasso implementation of scikitlearn (Pedregosa et al., 2011) with parameter α = 0.25 (...). While 'scikit-learn' is mentioned, a specific version number for the software dependency is not provided. |
| Experiment Setup | Yes | In the experiments (Figures 1, 2, 3, 4, 5, 6) we used the Graphical Lasso implementation of scikitlearn (Pedregosa et al., 2011) with parameter α = 0.25 (α can also be crossvalidated for even better performance) and in estimating δ, for each δe in {0.1, 0.2, . . . , 1, 2, . . . , 10}, we computed the validation error on a random, unseen tenth of the data and averaged over 10 times. The δe with smallest crossvalidated error was chosen. (...) with r2 = 1, σ2 = 1, γ = 2, ψ1 = 1/2, n = 3000 and ρ1 = 1, ρ2 0. |