Risk Bounds of Accelerated SGD for Overparameterized Linear Regression
Authors: Xuheng Li, Yihe Deng, Jingfeng Wu, Dongruo Zhou, Quanquan Gu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically verify that ASGD can outperform SGD when w0 w is mainly confined to the eigen-subspace of small eigenvalues. Data model. Our experiments are based on the setting of overparameterized linear regression, where the model dimenstion is d = 2000. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of California, Los Angeles, CA 90095, USA 2Simons Institute, University of California, Berkeley, CA 94720, USA 3Department of Computer Science, Indiana University Bloomington, IN 47408, USA |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the provided text. |
| Open Source Code | No | No explicit statement or link to open-source code for the described methodology was found in the provided text. |
| Open Datasets | No | The paper describes a synthetic data generation process: "The data covariance matrix H is diagonal with eigenvalues λi = i 2. The input xt follows Gaussian distribution N(0, H)... The ground truth weight vector is w = 0, and the label yt follows the distribution N(0, σ2) where σ2 = 0.01." This is a generative model, not a publicly accessible fixed dataset. |
| Dataset Splits | No | The paper describes a synthetic data generation process for experiments but does not specify training, validation, or test dataset splits in terms of percentages or sample counts for reproduction. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, or cloud instance specifications) used for running the experiments were provided in the text. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library names, frameworks) were mentioned in the provided text. |
| Experiment Setup | Yes | Hyperparameters of ASGD and SGD. We select parameters of ASGD so that it satisfies the requirements in (4.2). We first let eκ = 5. According to (4.2), δ satisfies δ 1/π2, so we pick δ = 0.1, which is also the stepsize of SGD. We then let α = 0.9875, so that (1 c)/δ = 2(1 α)/δ = 0.25 = λ2, which implies that bk = 2. Finally, we select β = (1 α)/α and γ = δ/(ψeκβ). We can verify that the parameters satisfy all requirements in (4.2). We fix the length of tail averaging as N = 500, and conduct experiments on different s where s = 50, 100, 150, . . . , 500. |