Solving Ridge Regression using Sketched Preconditioned SVRG

Authors: Alon Gonen, Francesco Orabona, Shai Shalev-Shwartz

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. The Empirical Gain of Sketched Preconditioning In this section we empirically demonstrate the gain of our method. We consider both regression problems and binary classifications tasks, where the square loss serves as a surrogate for the zero-one loss.
Researcher Affiliation Collaboration Alon Gonen ALONGNN@CS.HUJI.AC.IL The Hebrew University Francesco Orabona FRANCESCO@ORABONA.COM Yahoo Research, 229 West 43rd Street, 10036 New York, NY, USA Shai Shalev-Shwartz SHAIS@CS.HUJI.AC.IL The Hebrew University
Pseudocode Yes Algorithm 1 SVRG (Xiao & Zhang, 2014) ... Algorithm 2 Block Lanczos method (Musco & Musco, 2015) ... Algorithm 3 Sketched Preconditioned SVRG
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for their methodology is made publicly available.
Open Datasets Yes We use the following datasets: MNIST:2 A subset of MNIST, corresponding to the digits 4 and 7, where the task is to distinguish between the two digits. Here, n = 12107, d = 784. RCV1:3 The Reuters RCV1 collection. Here, n = 20242, d = 47236 and we consider a standard binary document classification task. CIFAR-10:4 Here, n = 50000, d = 3072. Following Frostig et al. (2015), the classification task is to distinguish between the animal categories to the automotive ones. real-sim:5 Here, n = 72309, d = 20958, and we consider a standard binary document classification task.
Dataset Splits No The paper describes dataset sizes and experimental parameters like step size and number of epochs, but it does not specify explicit training/validation/test splits or cross-validation methodology.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, memory, or cloud instance specifications.
Software Dependencies No The paper mentions using variants of SVRG but does not list any specific software libraries, frameworks, or their version numbers required to reproduce the experiments.
Experiment Setup Yes To minimally affect the inherent condition number, we added only a slight amount of regularization, namely, λ = 10 8. The loss used is the square loss. The step size, η, is optimally tuned for each method. Similarly to previous work on SVRG (Xiao & Zhang, 2014; Johnson & Zhang, 2013), the size of each epoch, m, is proportional to the number of points, n. We minimally preprocessed the data by average normalization: each instance vector is divided by the average ℓ2-norm of the instances. The number of epochs is up to 60. Note that in all cases we choose a small preconditioning parameter, namely k = 30