reproducibilityindex.ai

Last iterate convergence of SGD for Least-Squares in the Interpolation regime.

Authors: Aditya Vardhan Varre, Loucas Pillaud-Vivien, Nicolas Flammarion

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Motivated by the recent successes of neural networks that have the ability to ﬁt the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor perfectly ﬁts the inputs and outputs θ , φ(X) = Y , where φ(X) stands for a possibly inﬁnite dimensional non-linear feature map. To solve this problem, we consider the estimator given by the last iterate of stochastic gradient descent (SGD) with constant step-size. In this context, our contribution is twofold: (i) from a (stochastic) optimization perspective, we exhibit an archetypal problem where we can show explicitly the convergence of SGD ﬁnal iterate for a non-strongly convex problem with constant step-size whereas usual results use some form of average and (ii) from a statistical perspective, we give explicit non-asymptotic convergence rates in the over-parameterized setting and leverage a ﬁne-grained parameterization of the problem to exhibit polynomial rates that can be faster than O(1/T). The link with reproducing kernel Hilbert spaces is established.
Researcher Affiliation	Academia	Aditya Varre EPFL aditya.varre@epfl.ch Loucas Pillaud-Vivien EPFL loucas.pillaud-vivien@epfl.ch Nicolas Flammarion EPFL nicolas.flammarion@epfl.ch
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about providing open-source code for the methodology or a link to a code repository.
Open Datasets	No	The paper uses a "synthetic least-squares problem" with "normally distributed inputs (xn)n" and describes how the optimum is chosen and outputs generated. However, it does not provide concrete access information (link, DOI, specific repository, or citation to a public dataset) for this synthetic data generation process that would allow a third party to access or reproduce the exact dataset used.
Dataset Splits	No	The paper does not explicitly provide specific dataset split information (percentages, sample counts, or citations to predefined splits) for training, validation, and testing. It describes using a synthetic dataset but no splitting methodology is detailed.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for the experiments.
Experiment Setup	Yes	For d = 300 we consider a stream of normally distributed inputs (xn)n whose covariance matrix H has random eigenvectors vi and eigenvalues 1/i1/(1 α) for i = 1, . . . , d. The optimum is chosen randomly: θ = P 1/i 1 β/(1 α) 2 vi. This allows to reproduce the setting where the coefﬁcient α and β of the capacity and source conditions are perfectly controlled. The outputs (yn)n are generated through yn = θ , xn . We take a step-size γ = 1 2Tr H .