reproducibilityindex.ai

Convergence beyond the over-parameterized regime using Rayleigh quotients

Authors: David A. R. Robin, Kevin Scaman, marc lelarge

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we present a new strategy to prove the convergence of deep learning architectures to a zero training (or even testing) loss by gradient ow. Our analysis is centered on the notion of Rayleigh quotients in order to prove Kurdyka-Łojasiewicz inequalities for a broader set of neural network architectures and loss functions. We show that Rayleigh quotients provide a uni ed view for several convergence analysis techniques in the literature. Our strategy produces a proof of convergence for various examples of parametric learning. In particular, our analysis does not require the number of parameters to tend to in nity, nor the number of samples to be nite, thus extending to test loss minimization and beyond the over-parameterized regime.
Researcher Affiliation	Academia	David A. R. Robin INRIA École Normale Supérieure PSL Research University david.a.r.robin@gmail.com Kevin Scaman INRIA École Normale Supérieure PSL Research University kevin.scaman@inria.fr Marc Lelarge INRIA École Normale Supérieure PSL Research University marc.lelarge@ens.fr
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide explicit statements or links indicating the release of source code for the described methodology.
Open Datasets	No	The paper uses abstract data distributions (e.g., 'distribution D on X', 'uniform distribution on the interval [ R, +R]') and constructs toy examples ('Bernoulli s lemniscate'), but does not provide specific links, DOIs, repository names, or formal citations for publicly available datasets used for training.
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	Figure 2: Loss level sets with parameters t = (4, 1) and f (t) = 3, corresponding to quadratic loss ℓ: (a, b) (4a b + 3)2 and convergence speed with step size 10 3 and initial estimate θ(0) = 0. Both ows converge to the same functional minimum (FS(θ S) = FL(θ L)), the one depicted on the bottom in (a).