Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression
Authors: Aymeric Dieuleveut, Nicolas Flammarion, Francis Bach
JMLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7. Experiments We illustrate now our theoretical results on synthetic examples. For d = 25 we consider normally distributed inputs xn with random covariance matrix Σ which has eigenvalues 1/i3, for i = 1, . . . , d, and random optimum θ and starting point θ0 such that θ0 θ = 1. The outputs yn are generated from a linear function with homoscedastic noise with unit signal to noise-ratio (σ2 = 1), we take R2 = tr Σ the average radius of the data and a step-size γ = 1/R2 and λ = 0. The additive noise oracle is used. We show results averaged over 10 replications. We compare the performance of averaged SGD (Av SGD), Acc SGD (usual Nesterov acceleration for convex functions) and our novel averaged accelerated SGD from Section 4 Av Acc SGD (which is not the averaging of Acc SGD because the momentum term is proportional to 1 3/n for Acc SGD instead of being equal to 1 for Av Acc SGD), on two different problems: one deterministic ( θ0 θ = 1, σ2 = 0) which will illustrate how the bias term behaves, and one purely stochastic ( θ0 θ = 0, σ2 = 1) which will illustrate how the variance term behaves. For the bias (left plot of Figure 1), Av SGD converges at speed O(1/n), while Av Acc SGD and Acc SGD converge both at speed O(1/n2). However, as mentioned in the observations following Corollary 4, Acc SGD takes advantage of the hidden strong convexity of the least-squares function and starts converging linearly at the end. For the variance (right plot of Figure 1), Acc SGD is not converging to the optimum and keeps oscillating whereas Av SGD and Av Acc SGD both converge to the optimum at a speed O(1/n). However Av SGD remains slightly faster in the beginning. Note that for small n, or when the bias L θ0 θ 2/n2 is much bigger than the variance σ2d/n, the bias may have a stronger effect, although asymptotically, the variance always dominates. It is thus essential to have an algorithm which is optimal in both regimes; this is achieved by Av Acc SGD. |
| Researcher Affiliation | Academia | INRIA D epartement d Informatique de l ENS, Ecole normale sup erieure, CNRS, PSL Research University Paris, France. The email addresses (e.g., @ens.fr) are also associated with academic institutions. |
| Pseudocode | No | The paper describes algorithms (Stochastic gradient descent and Accelerated stochastic gradient descent) using mathematical equations (2) and (3) and accompanying text, but it does not include a clearly labeled pseudocode block or algorithm listing. |
| Open Source Code | No | The paper does not provide any statement about releasing code or a link to a code repository. The mentioned CC-BY 4.0 license pertains to the paper itself, not the implementation code. |
| Open Datasets | No | The experiments section explicitly states the use of "synthetic examples": "7. Experiments We illustrate now our theoretical results on synthetic examples." Therefore, no publicly available dataset was used. |
| Dataset Splits | No | The paper uses synthetic data, generated according to specified parameters (e.g., "d = 25 we consider normally distributed inputs xn with random covariance matrix Σ which has eigenvalues 1/i3"). It describes how data is generated for simulation but does not specify train/test/validation splits as would be typical for experiments on pre-existing datasets. |
| Hardware Specification | No | The paper describes the setup for its synthetic experiments in Section 7 but does not mention any specific hardware (e.g., GPU models, CPU types, or cloud computing environments) used to run these experiments. |
| Software Dependencies | No | The paper does not mention any specific software libraries, frameworks, or their version numbers used in the implementation or for conducting the experiments. |
| Experiment Setup | Yes | 7. Experiments We illustrate now our theoretical results on synthetic examples. For d = 25 we consider normally distributed inputs xn with random covariance matrix Σ which has eigenvalues 1/i3, for i = 1, . . . , d, and random optimum θ and starting point θ0 such that θ0 θ = 1. The outputs yn are generated from a linear function with homoscedastic noise with unit signal to noise-ratio (σ2 = 1), we take R2 = tr Σ the average radius of the data and a step-size γ = 1/R2 and λ = 0. The additive noise oracle is used. We show results averaged over 10 replications. |