Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions
Authors: Maksim Velikanov, Dmitry Yarotsky
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the obtained convergence bounds and acceleration strategies are not only relevant for exactly quadratic optimization problems, but also fairly accurate when applied to the training of neural networks. Keywords: Gradient Descent, Steepest Descent, Heavy Ball, Conjugate Gradients, power-law spectrum, convergence rate, tight bounds, non-strongly-convex least squares, acceleration, neural networks |
| Researcher Affiliation | Collaboration | Maksim Velikanov EMAIL Technology Innovation Institute, Abu Dhabi, UAE; CMAP, Ecole Polytechnique, Paris, France Dmitry Yarotsky EMAIL Center for Artificial Intelligence Technology Skolkovo Institute of Science and Technology Moscow, Russia |
| Pseudocode | No | The paper describes optimization algorithms (Gradient Descent, Steepest Descent, Heavy Ball, Conjugate Gradients) using mathematical formulas and descriptions (e.g., "wn+1 = wn - αn∇L(wn)"), but it does not present them in a structured pseudocode block or algorithm box. |
| Open Source Code | No | No explicit statement about providing source code or a link to a repository for the described methodology was found in the paper. |
| Open Datasets | Yes | In Figure 1 (left) we show the loss trajectory of a neural network in a very basic example learning the standard MNIST digit classifier (Le Cun et al., 2010) by basic GD in a kernel regime (see Section H.1 for details). |
| Dataset Splits | Yes | We consider a shallow fully-connected ReLU network with 1000 hidden neurons and train it on the full MNIST dataset with standard train-test split. |
| Hardware Specification | No | No specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) were found in the paper. |
| Software Dependencies | Yes | We used symbolic computation, e.g. Wolfram Mathematica Inc.. [...] Wolfram Research, Inc. Mathematica, Version 13.2. |
| Experiment Setup | Yes | For constant rate GD and constant rate HB we used parameters α = 1 and β = 0.9. The algorithm scheduled HB uses schedule (40) for αn, βn with parameters a = ζ, b = 0, r = 1. The asymptotic scheduled HB uses asymptotic n version of Jacobi schedule given by the rightmost part of (40), and additionally set αn = 1 (it is rather unnecessary artifact of our experimentation). [...] We consider a shallow fully-connected ReLU network of width N = 1000 with the NTK parametrization. Its function f(x) can be written as [...] Then we train this network on the full MNIST dataset with standard train-test split. Importantly, we don’t use mini-batches during training steps, but process the whole train dataset of size M = 50000 during optimization. |