Ridge Regression: Structure, Cross-Validation, and Sketching

Authors: Sifan Liu, Edgar Dobriban

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results are illustrated by simulations and by analyzing empirical data. We provide proofs and additional simulations in the Appendix. Code reproducing the experiments in the paper are available at https://github. com/liusf15/Ridge Regression.
Researcher Affiliation Academia Sifan Liu Department of Statistics Stanford University Stanford, CA 94305, USA sfliu@stanford.edu, Edgar Dobriban Department of Statistics University of Pennsylvania Philadelphia, PA 19104, USA dobriban@wharton.upenn.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code reproducing the experiments in the paper are available at https://github. com/liusf15/Ridge Regression.
Open Datasets Yes Left: Cross-validation on the Million Song Dataset (MSD, Bertin-Mahieux et al., 2011). For the test error, we train on 1000 training datapoints and fit on 9000 test datapoints. Right: Cross-validation on the flights dataset Wickham (2018). For the test error, we train on 300 datapoints and fit on 27000 test datapoints. Suppose we split the n datapoints (samples) into K equal-sized subsets, each containing n0 = n/K samples. We use the k-th subset (Xk, Yk) as the validation set and the other K 1 subsets (X k, Y k), with total sample size n1 = (K 1)n/K as the training set. Left: we generate a training set (n = 1000, p = 700, γ = 0.7, α = σ = 1) and a test set (ntest = 500) from the same distribution. We split the training set into K = 5 equally sized folds and do cross-validation. We take n = 500, p = 550, α = 20, σ = 1, K = 5. As for train-test validation, we take 80% of samples to be training set and the rest 20% be test set.
Dataset Splits Yes For the error bar, we take n = 1000, p = 90, K = 5, and average over 90 different sub-datasets. For the error bar, we take n = 300, p = 21, K = 5, and average over 180 different sub-datasets. Suppose we split the n datapoints (samples) into K equal-sized subsets, each containing n0 = n/K samples. We use the k-th subset (Xk, Yk) as the validation set and the other K 1 subsets (X k, Y k), with total sample size n1 = (K 1)n/K as the training set. We split the training set into K = 5 equally sized folds and do cross-validation. We take n = 500, p = 550, α = 20, σ = 1, K = 5. As for train-test validation, we take 80% of samples to be training set and the rest 20% be test set.
Hardware Specification No The paper discusses computational complexity in terms of flop counts but does not specify any particular hardware (e.g., CPU, GPU models, memory) used for the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes Figure 1: Left: γ = p/n = 0.2; right: γ = 2. The data matrix X has iid Gaussian entries. The coefficient β has distribution β N(0, Ip/p), while the noise ε N(0, Ip). Figure 2: For the error bar, we take n = 1000, p = 90, K = 5, and average over 90 different sub-datasets. For the test error, we train on 1000 training datapoints and fit on 9000 test datapoints. Figure 3: Primal orthogonal sketching with n = 500, γ = 5, λ = 1.5, α = 3, σ = 1. Left: MSE of primal sketching normalized by the MSE of ridge regression. The error bar is the standard deviation over 10 repetitions. Figure 4: Right: Gaussian dual sketch when there is no noise. γ = 0.4, α = 1, λ = 1 (both for original and sketching). Standard error over 50 experiments. Figure 7: Left: we generate a training set (n = 1000, p = 700, γ = 0.7, α = σ = 1) and a test set (ntest = 500) from the same distribution. We split the training set into K = 5 equally sized folds and do cross-validation.