Large Scale Constrained Linear Regression Revisited: Faster Algorithms via Preconditioning

Authors: Di Wang, Jinhui Xu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on benchmark and synthetic datasets suggest that our methods indeed outperform existing ones considerably in both the low and high precision cases.
Researcher Affiliation Academia Di Wang Department of Computer Science and Engineering State University of New York at Buffalo Jinhui Xu Department of Computer Science and Engineering State University of New York at Buffalo
Pseudocode Yes Algorithm 2 HDpw Batch SGD(A,b,x0,T,r,η,s) Input: x0 is the initial point, r is the batch size, η is the fixed step size, s is the sketch size, and T is the iteration number. 1: Compute R Rd d which makes AR 1 an (O(d), O(1), 2)-conditioned basis of A as in Algorithm 1 by using a sketch matrix S with size s n. 2: Compute HDA and HDb, where HD is a Randomized Hadamard Transform. 3: for t 1, . . . T do 4: Randomly sample an indices set τt of size r, where each index in τt is i.i.d uniformly sampled. 5: cτt = 2n j τt (HDA)T j [(HDA)jxt 1 (HDb)j] = 2n r (HDA)T τt[(HDA)τtxt 1 (HDb)τt] 6: xt = arg minx W 1 2 R(xt 1 x) 2 2 + η cτt, x = PW(xt 1 ηR 1(R 1)T cτt) 7: end for 8: return xavg T =
Open Source Code No The paper only provides links to third-party code used, not the source code for the methodology described in this paper.
Open Datasets Yes The datasets Year1 and Buzz2 come from UCI Machine Learning Repository (Lichman 2013).
Dataset Splits No The paper does not explicitly provide details about training, validation, or test dataset splits or cross-validation setup.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies No The paper mentions software like MATLAB, Count Sketch, SGD, and Adagrad but does not provide specific version numbers for these software components.
Experiment Setup Yes We use the step size as described in Theorem 2 (note that we assume that the step size is already known in advance). pw Gradient, i.e. Algorithm 4. We set η = 1/2 as the step size. In each experiment, the initial value x0 is set to be the zero vector.