Convergence beyond the over-parameterized regime using Rayleigh quotients
Authors: David A. R. Robin, Kevin Scaman, marc lelarge
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we present a new strategy to prove the convergence of deep learning architectures to a zero training (or even testing) loss by gradient ow. Our analysis is centered on the notion of Rayleigh quotients in order to prove Kurdyka-Łojasiewicz inequalities for a broader set of neural network architectures and loss functions. We show that Rayleigh quotients provide a uni ed view for several convergence analysis techniques in the literature. Our strategy produces a proof of convergence for various examples of parametric learning. In particular, our analysis does not require the number of parameters to tend to in nity, nor the number of samples to be nite, thus extending to test loss minimization and beyond the over-parameterized regime. |
| Researcher Affiliation | Academia | David A. R. Robin INRIA École Normale Supérieure PSL Research University david.a.r.robin@gmail.com Kevin Scaman INRIA École Normale Supérieure PSL Research University kevin.scaman@inria.fr Marc Lelarge INRIA École Normale Supérieure PSL Research University marc.lelarge@ens.fr |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide explicit statements or links indicating the release of source code for the described methodology. |
| Open Datasets | No | The paper uses abstract data distributions (e.g., 'distribution D on X', 'uniform distribution on the interval [ R, +R]') and constructs toy examples ('Bernoulli s lemniscate'), but does not provide specific links, DOIs, repository names, or formal citations for publicly available datasets used for training. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Figure 2: Loss level sets with parameters t = (4, 1) and f (t) = 3, corresponding to quadratic loss ℓ: (a, b) (4a b + 3)2 and convergence speed with step size 10 3 and initial estimate θ(0) = 0. Both ows converge to the same functional minimum (FS(θ S) = FL(θ L)), the one depicted on the bottom in (a). |