Ridge Regression: Structure, Cross-Validation, and Sketching
Authors: Sifan Liu, Edgar Dobriban
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results are illustrated by simulations and by analyzing empirical data. We provide proofs and additional simulations in the Appendix. Code reproducing the experiments in the paper are available at https://github. com/liusf15/Ridge Regression. |
| Researcher Affiliation | Academia | Sifan Liu Department of Statistics Stanford University Stanford, CA 94305, USA sfliu@stanford.edu, Edgar Dobriban Department of Statistics University of Pennsylvania Philadelphia, PA 19104, USA dobriban@wharton.upenn.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code reproducing the experiments in the paper are available at https://github. com/liusf15/Ridge Regression. |
| Open Datasets | Yes | Left: Cross-validation on the Million Song Dataset (MSD, Bertin-Mahieux et al., 2011). For the test error, we train on 1000 training datapoints and fit on 9000 test datapoints. Right: Cross-validation on the flights dataset Wickham (2018). For the test error, we train on 300 datapoints and fit on 27000 test datapoints. Suppose we split the n datapoints (samples) into K equal-sized subsets, each containing n0 = n/K samples. We use the k-th subset (Xk, Yk) as the validation set and the other K 1 subsets (X k, Y k), with total sample size n1 = (K 1)n/K as the training set. Left: we generate a training set (n = 1000, p = 700, γ = 0.7, α = σ = 1) and a test set (ntest = 500) from the same distribution. We split the training set into K = 5 equally sized folds and do cross-validation. We take n = 500, p = 550, α = 20, σ = 1, K = 5. As for train-test validation, we take 80% of samples to be training set and the rest 20% be test set. |
| Dataset Splits | Yes | For the error bar, we take n = 1000, p = 90, K = 5, and average over 90 different sub-datasets. For the error bar, we take n = 300, p = 21, K = 5, and average over 180 different sub-datasets. Suppose we split the n datapoints (samples) into K equal-sized subsets, each containing n0 = n/K samples. We use the k-th subset (Xk, Yk) as the validation set and the other K 1 subsets (X k, Y k), with total sample size n1 = (K 1)n/K as the training set. We split the training set into K = 5 equally sized folds and do cross-validation. We take n = 500, p = 550, α = 20, σ = 1, K = 5. As for train-test validation, we take 80% of samples to be training set and the rest 20% be test set. |
| Hardware Specification | No | The paper discusses computational complexity in terms of flop counts but does not specify any particular hardware (e.g., CPU, GPU models, memory) used for the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | Figure 1: Left: γ = p/n = 0.2; right: γ = 2. The data matrix X has iid Gaussian entries. The coefficient β has distribution β N(0, Ip/p), while the noise ε N(0, Ip). Figure 2: For the error bar, we take n = 1000, p = 90, K = 5, and average over 90 different sub-datasets. For the test error, we train on 1000 training datapoints and fit on 9000 test datapoints. Figure 3: Primal orthogonal sketching with n = 500, γ = 5, λ = 1.5, α = 3, σ = 1. Left: MSE of primal sketching normalized by the MSE of ridge regression. The error bar is the standard deviation over 10 repetitions. Figure 4: Right: Gaussian dual sketch when there is no noise. γ = 0.4, α = 1, λ = 1 (both for original and sketching). Standard error over 50 experiments. Figure 7: Left: we generate a training set (n = 1000, p = 700, γ = 0.7, α = σ = 1) and a test set (ntest = 500) from the same distribution. We split the training set into K = 5 equally sized folds and do cross-validation. |