Optimal Randomized First-Order Methods for Least-Squares Problems
Authors: Jonathan Lacotte, Mert Pilanci
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | First, we generate a synthetic dataset with n = 8192, d = 1600 and m {1700, 3500, 5700}. Although our results are universal in the sense that it does not depend on the spectral decay of the matrix A, we still consider a challenging setting for first-order methods, that is, we generate an n d matrix A with an exponential spectral decay. In Figure 2, we verify numerically that Algorithm 2 is faster than the best Heavy-ball method with refreshed SRHT sketches ( SRHT (refreshed) ), and than Algorithm 1. Further, we compare Algorithm 2 to the Heavy-ball method with fixed SRHT embedding whose parameters are found based on edge eigenvalues analysis, using either our new density fh ( SRHT (edge eig.) ) as described previously in Section 2.2.1 , or, the previous bounds derived by (Tropp, 2011) ( SRHT (baseline) ). As predicted, Algorithm 2 performs very similarly to the former, and better than the latter. We mention that we use small perturbations of the algorithmic parameters derived from our asymptotic analysis. Following the notations introduced in Theorem 1, instead of at and bt, we use aδ t = (1 + δ)at and bδ t = (1 δ)bt with δ = 0.01. These conservative perturbations are necessary in practice due to the finite-sample approximations. We defer a detailed description of the experimental setup to Appendix C. Second, we verify that our predicted convergence rates for Algorithms 1 and 2 are matched empirically, on Figure 3. For this purpose, we generate a synthetic dataset with n = 8192, d {500, 1250, 2000} and varying sketching size, with a data matrix having an exponential spectral decay. Lastly, we verify our results on standard machine learning benchmarks, that is, we consider the MNIST and CIFAR10 datasets, for which results are respectively reported on Figures 4 and 5. |
| Researcher Affiliation | Academia | Jonathan Lacotte 1 Mert Pilanci 1 1Department of Electrical Engineering, Stanford University. Correspondence to: Jonathan Lacotte <lacotte@stanford.edu>. |
| Pseudocode | Yes | Algorithm 1 Optimal First-Order Method for Gaussian embeddings. Algorithm 2 Optimal First-Order Method for SRHT (or Haar) embeddings. |
| Open Source Code | No | The paper does not provide any explicit statements about open-source code availability or links to a code repository for the methodology described. |
| Open Datasets | Yes | Lastly, we verify our results on standard machine learning benchmarks, that is, we consider the MNIST and CIFAR10 datasets, for which results are respectively reported on Figures 4 and 5. |
| Dataset Splits | No | The paper mentions using well-known datasets but does not provide specific details on how these datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or citations to standard splits). |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models, or cloud computing resources) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | We mention that we use small perturbations of the algorithmic parameters derived from our asymptotic analysis. Following the notations introduced in Theorem 1, instead of at and bt, we use aδ t = (1 + δ)at and bδ t = (1 δ)bt with δ = 0.01. These conservative perturbations are necessary in practice due to the finite-sample approximations. We defer a detailed description of the experimental setup to Appendix C. |