Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Weighted SGD for $\ell_p$ Regression with Randomized Preconditioning
Authors: Jiyan Yang, Yin-Lam Chow, Christopher Ré, Michael W. Mahoney
JMLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, the effectiveness of such algorithms is illustrated numerically on both synthetic and real datasets, and the results are consistent with our theoretical findings and demonstrate that PWSGD converges to a medium-precision solution, e.g., ϵ = 10 3, more quickly. [...] In this section, we provide empirical evaluations of our main algorithm PWSGD. We evaluate its convergence rate and overall running time on both synthetic and real datasets. |
| Researcher Affiliation | Academia | Jiyan Yang EMAIL Institute for Computational and Mathematical Engineering Stanford University Stanford, CA 94305, USA; Yin-Lam Chow EMAIL Institute for Computational and Mathematical Engineering Stanford University Stanford, CA 94305, USA; Christopher R e EMAIL Department of Computer Science Stanford University Stanford, CA 94305, USA; Michael W. Mahoney EMAIL International Computer Science Institute and Department of Statistics University of California, Berkeley Berkeley, CA 94720, USA |
| Pseudocode | Yes | Algorithm 1 PWSGD preconditioned weighted SGD for over-determined ℓ1 and ℓ2 regression [...] Algorithm 2 Compute ϵ-coreset [...] Algorithm 3 RLA methods with algorithmic leveraging for constrained ℓp regression |
| Open Source Code | No | The paper does not provide explicit statements or links to source code for the methodology described. |
| Open Datasets | Yes | name #rows # columns κ(A) Year7 5 105 90 2 103 Buzz8 5 105 77 108 [...] 7. https://archive.ics.uci.edu/ml/datasets/Year Prediction MSD 8. https://archive.ics.uci.edu/ml/datasets/Buzz+in+social+media+ |
| Dataset Splits | No | The paper describes generating synthetic datasets and using the 'Year' and 'Buzz' real datasets, but it does not specify any training/test/validation splits, percentages, or predefined splitting methodologies for these datasets. It mentions a sampling strategy for SGD within epochs but not for evaluation splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions several algorithms and methods (e.g., Ada Grad, SVRG, RLA, LSQR, interior-point method) but does not specify any software libraries, frameworks, or their version numbers used for implementing PWSGD or other comparative methods. |
| Experiment Setup | Yes | For each of these methods, given a target relative error ϵ = 0.1 on the objective, i.e., ( ˆf f )/f = 0.1, we use the optimal step-size suggested in the theory. [...] For Ada Grad, we use diagonal scaling and mirror descent update rule. For SVRG, we compute the full gradient every [n/2] iterations. As for implementation details, in all SGD-like algorithms, step-size tuning is done by grid-searching where at each trial the algorithm is run with a candidate step-size for enough iterations. Then the step-size that yields the lowest error within 10 seconds is used. The time/accuracy pair at every 2000 iterations is recorded. [...] In our experiments, we set m = 200. [...] the effect of stepsize η. When a constant stepsize is used, typically, a smaller η allows the algorithm to converge to a more accurate solution with a slower convergence rate. This is verified by Figure 6(a) in which the performance of PWSGD-full with larger η s saturates earlier at a coarser level while η = 0.001 allows the algorithm to achieve a finer solution. |