Optimal Randomized First-Order Methods for Least-Squares Problems

Authors: Jonathan Lacotte, Mert Pilanci

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we generate a synthetic dataset with n = 8192, d = 1600 and m {1700, 3500, 5700}. Although our results are universal in the sense that it does not depend on the spectral decay of the matrix A, we still consider a challenging setting for first-order methods, that is, we generate an n d matrix A with an exponential spectral decay. In Figure 2, we verify numerically that Algorithm 2 is faster than the best Heavy-ball method with refreshed SRHT sketches ( SRHT (refreshed) ), and than Algorithm 1. Further, we compare Algorithm 2 to the Heavy-ball method with fixed SRHT embedding whose parameters are found based on edge eigenvalues analysis, using either our new density fh ( SRHT (edge eig.) ) as described previously in Section 2.2.1 , or, the previous bounds derived by (Tropp, 2011) ( SRHT (baseline) ). As predicted, Algorithm 2 performs very similarly to the former, and better than the latter. We mention that we use small perturbations of the algorithmic parameters derived from our asymptotic analysis. Following the notations introduced in Theorem 1, instead of at and bt, we use aδ t = (1 + δ)at and bδ t = (1 δ)bt with δ = 0.01. These conservative perturbations are necessary in practice due to the finite-sample approximations. We defer a detailed description of the experimental setup to Appendix C. Second, we verify that our predicted convergence rates for Algorithms 1 and 2 are matched empirically, on Figure 3. For this purpose, we generate a synthetic dataset with n = 8192, d {500, 1250, 2000} and varying sketching size, with a data matrix having an exponential spectral decay. Lastly, we verify our results on standard machine learning benchmarks, that is, we consider the MNIST and CIFAR10 datasets, for which results are respectively reported on Figures 4 and 5.
Researcher Affiliation Academia Jonathan Lacotte 1 Mert Pilanci 1 1Department of Electrical Engineering, Stanford University. Correspondence to: Jonathan Lacotte <lacotte@stanford.edu>.
Pseudocode Yes Algorithm 1 Optimal First-Order Method for Gaussian embeddings. Algorithm 2 Optimal First-Order Method for SRHT (or Haar) embeddings.
Open Source Code No The paper does not provide any explicit statements about open-source code availability or links to a code repository for the methodology described.
Open Datasets Yes Lastly, we verify our results on standard machine learning benchmarks, that is, we consider the MNIST and CIFAR10 datasets, for which results are respectively reported on Figures 4 and 5.
Dataset Splits No The paper mentions using well-known datasets but does not provide specific details on how these datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or citations to standard splits).
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, or cloud computing resources) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup Yes We mention that we use small perturbations of the algorithmic parameters derived from our asymptotic analysis. Following the notations introduced in Theorem 1, instead of at and bt, we use aδ t = (1 + δ)at and bδ t = (1 δ)bt with δ = 0.01. These conservative perturbations are necessary in practice due to the finite-sample approximations. We defer a detailed description of the experimental setup to Appendix C.