Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

Authors: Xuheng Li, Yihe Deng, Jingfeng Wu, Dongruo Zhou, Quanquan Gu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically verify that ASGD can outperform SGD when w0 w is mainly confined to the eigen-subspace of small eigenvalues. Data model. Our experiments are based on the setting of overparameterized linear regression, where the model dimenstion is d = 2000.
Researcher Affiliation Academia 1Department of Computer Science, University of California, Los Angeles, CA 90095, USA 2Simons Institute, University of California, Berkeley, CA 94720, USA 3Department of Computer Science, Indiana University Bloomington, IN 47408, USA
Pseudocode No No pseudocode or algorithm blocks were found in the provided text.
Open Source Code No No explicit statement or link to open-source code for the described methodology was found in the provided text.
Open Datasets No The paper describes a synthetic data generation process: "The data covariance matrix H is diagonal with eigenvalues λi = i 2. The input xt follows Gaussian distribution N(0, H)... The ground truth weight vector is w = 0, and the label yt follows the distribution N(0, σ2) where σ2 = 0.01." This is a generative model, not a publicly accessible fixed dataset.
Dataset Splits No The paper describes a synthetic data generation process for experiments but does not specify training, validation, or test dataset splits in terms of percentages or sample counts for reproduction.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, or cloud instance specifications) used for running the experiments were provided in the text.
Software Dependencies No No specific software dependencies with version numbers (e.g., library names, frameworks) were mentioned in the provided text.
Experiment Setup Yes Hyperparameters of ASGD and SGD. We select parameters of ASGD so that it satisfies the requirements in (4.2). We first let eκ = 5. According to (4.2), δ satisfies δ 1/π2, so we pick δ = 0.1, which is also the stepsize of SGD. We then let α = 0.9875, so that (1 c)/δ = 2(1 α)/δ = 0.25 = λ2, which implies that bk = 2. Finally, we select β = (1 α)/α and γ = δ/(ψeκβ). We can verify that the parameters satisfy all requirements in (4.2). We fix the length of tail averaging as N = 500, and conduct experiments on different s where s = 50, 100, 150, . . . , 500.