reproducibilityindex.ai

An Iterative, Sketching-based Framework for Ridge Regression

Authors: Agniva Chowdhury, Jiasen Yang, Petros Drineas

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluations verify our theoretical results on both real and synthetic data.4. Empirical EvaluationWe perform experiments on the ARCENE dataset (Guyon et al., 2005) from the UCI repository (Lichman, 2013). The design matrix contains 200 samples with 10, 000 real-valued features; we normalize the entries to be within the interval [0, 1]. The response vector consists of ±1 labels. We also perform experiments on synthetic data generated as in Chen et al. (2015); see Appendix H for details.In our experiments, we compare three different choices of sampling probabilities: selecting columns (i) uniformly at random, (ii) proportional to their leverage scores, or (iii) proportional to their ridge leverage scores. For each sampling method, we run Algorithm 1 for 50 iterations with a variety of sketch sizes, and measure (i) the relative error of the solution vector bx x 2 x 2 , where x is the true optimal solution and (ii) the objective sub-optimality f(bx ) f(x ) 1, where f(x) = Ax b 2 2 + λ x 2 2 is the objective function for the ridge-regression problem.The results are shown in Figure 1. Figures 1a and 1b plot the relative error of the solution vector and the objective suboptimality (for a ﬁxed sketch size) as the iterative algorithm progresses. Figure 1c plots the relative error of the solution with respect to varying sketch sizes (the plots for objective sub-optimality are analogous and thus omitted). We observe that both the solution error and the objective sub-optimality decay exponentially as our iterative algorithm progresses.4
Researcher Affiliation	Academia	Agniva Chowdhury 1 Jiasen Yang 1 Petros Drineas 21Department of Statistics, Purdue University, West Lafayette, IN 2Department of Computer Science, Purdue University, West Lafayette, IN.
Pseudocode	Yes	Algorithm 1 Iterative, sketching-based ridge regressionInput: A Rn d, b Rn, λ > 0; number of iterations t > 0; sketching matrix S Rd s;Initialize: b(0) b, ex(0) 0d, y(0) 0n; for j = 1 to t dob(j) b(j 1) λy(j 1) Aex(j 1); y(j) (ASSTAT + λIn) 1b(j); ex(j) ATy(j); end for Output: Approximate solution vector bx = Pt j=1 ex(j);
Open Source Code	No	The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	Yes	We perform experiments on the ARCENE dataset (Guyon et al., 2005) from the UCI repository (Lichman, 2013). The design matrix contains 200 samples with 10, 000 real-valued features; we normalize the entries to be within the interval [0, 1]. The response vector consists of ±1 labels. We also perform experiments on synthetic data generated as in Chen et al. (2015); see Appendix H for details.
Dataset Splits	No	The paper states it uses the ARCENE dataset and synthetic data, and mentions 'design matrix' and 'response vector'. It discusses calculating 'relative error of the solution vector' and 'objective sub-optimality' on the data, but it does not specify how the data was split into training, validation, or test sets with percentages or sample counts.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments (e.g., CPU, GPU models, memory, or cloud instances).
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	For these experiments, we have set the regularization parameter λ = 10 in the ridge regression objective as well as when computing the ridge leverage score sampling probabilities.