Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions

Authors: Maksim Velikanov, Dmitry Yarotsky

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that the obtained convergence bounds and acceleration strategies are not only relevant for exactly quadratic optimization problems, but also fairly accurate when applied to the training of neural networks. Keywords: Gradient Descent, Steepest Descent, Heavy Ball, Conjugate Gradients, power-law spectrum, convergence rate, tight bounds, non-strongly-convex least squares, acceleration, neural networks
Researcher Affiliation	Collaboration	Maksim Velikanov EMAIL Technology Innovation Institute, Abu Dhabi, UAE; CMAP, Ecole Polytechnique, Paris, France Dmitry Yarotsky EMAIL Center for Artiﬁcial Intelligence Technology Skolkovo Institute of Science and Technology Moscow, Russia
Pseudocode	No	The paper describes optimization algorithms (Gradient Descent, Steepest Descent, Heavy Ball, Conjugate Gradients) using mathematical formulas and descriptions (e.g., "wn+1 = wn - αn∇L(wn)"), but it does not present them in a structured pseudocode block or algorithm box.
Open Source Code	No	No explicit statement about providing source code or a link to a repository for the described methodology was found in the paper.
Open Datasets	Yes	In Figure 1 (left) we show the loss trajectory of a neural network in a very basic example learning the standard MNIST digit classiﬁer (Le Cun et al., 2010) by basic GD in a kernel regime (see Section H.1 for details).
Dataset Splits	Yes	We consider a shallow fully-connected ReLU network with 1000 hidden neurons and train it on the full MNIST dataset with standard train-test split.
Hardware Specification	No	No specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) were found in the paper.
Software Dependencies	Yes	We used symbolic computation, e.g. Wolfram Mathematica Inc.. [...] Wolfram Research, Inc. Mathematica, Version 13.2.
Experiment Setup	Yes	For constant rate GD and constant rate HB we used parameters α = 1 and β = 0.9. The algorithm scheduled HB uses schedule (40) for αn, βn with parameters a = ζ, b = 0, r = 1. The asymptotic scheduled HB uses asymptotic n version of Jacobi schedule given by the rightmost part of (40), and additionally set αn = 1 (it is rather unnecessary artifact of our experimentation). [...] We consider a shallow fully-connected ReLU network of width N = 1000 with the NTK parametrization. Its function f(x) can be written as [...] Then we train this network on the full MNIST dataset with standard train-test split. Importantly, we don’t use mini-batches during training steps, but process the whole train dataset of size M = 50000 during optimization.