Solving Ridge Regression using Sketched Preconditioned SVRG
Authors: Alon Gonen, Francesco Orabona, Shai Shalev-Shwartz
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. The Empirical Gain of Sketched Preconditioning In this section we empirically demonstrate the gain of our method. We consider both regression problems and binary classifications tasks, where the square loss serves as a surrogate for the zero-one loss. |
| Researcher Affiliation | Collaboration | Alon Gonen ALONGNN@CS.HUJI.AC.IL The Hebrew University Francesco Orabona FRANCESCO@ORABONA.COM Yahoo Research, 229 West 43rd Street, 10036 New York, NY, USA Shai Shalev-Shwartz SHAIS@CS.HUJI.AC.IL The Hebrew University |
| Pseudocode | Yes | Algorithm 1 SVRG (Xiao & Zhang, 2014) ... Algorithm 2 Block Lanczos method (Musco & Musco, 2015) ... Algorithm 3 Sketched Preconditioned SVRG |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for their methodology is made publicly available. |
| Open Datasets | Yes | We use the following datasets: MNIST:2 A subset of MNIST, corresponding to the digits 4 and 7, where the task is to distinguish between the two digits. Here, n = 12107, d = 784. RCV1:3 The Reuters RCV1 collection. Here, n = 20242, d = 47236 and we consider a standard binary document classification task. CIFAR-10:4 Here, n = 50000, d = 3072. Following Frostig et al. (2015), the classification task is to distinguish between the animal categories to the automotive ones. real-sim:5 Here, n = 72309, d = 20958, and we consider a standard binary document classification task. |
| Dataset Splits | No | The paper describes dataset sizes and experimental parameters like step size and number of epochs, but it does not specify explicit training/validation/test splits or cross-validation methodology. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, memory, or cloud instance specifications. |
| Software Dependencies | No | The paper mentions using variants of SVRG but does not list any specific software libraries, frameworks, or their version numbers required to reproduce the experiments. |
| Experiment Setup | Yes | To minimally affect the inherent condition number, we added only a slight amount of regularization, namely, λ = 10 8. The loss used is the square loss. The step size, η, is optimally tuned for each method. Similarly to previous work on SVRG (Xiao & Zhang, 2014; Johnson & Zhang, 2013), the size of each epoch, m, is proportional to the number of points, n. We minimally preprocessed the data by average normalization: each instance vector is divided by the average ℓ2-norm of the instances. The number of epochs is up to 60. Note that in all cases we choose a small preconditioning parameter, namely k = 30 |