Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Tight analyses of first-order methods with error feedback

Authors: Daniel Berg Thomsen, Adrien Taylor, Aymeric Dieuleveut

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we provide a tight analysis of both of these methods. Specifically, we find the Lyapunov function that yields the best possible convergence rate for each method with matching lower bounds. This principled approach yields sharp performance guarantees and enables a rigorous, apples-to-apples comparison between EF, EF21, and compressed gradient descent. Our analysis is carried out in the simplified single-agent setting, which allows for clean theoretical insights and fair comparison of the underlying mechanisms.
Researcher Affiliation Academia Daniel Berg Thomsen1,2 Adrien Taylor1 Aymeric Dieuleveut2 1INRIA, D.I. École Normale Supérieure, PSL Research University, 75005 Paris, France 2CMAP, CNRS, École polytechnique, Institut Polytechnique de Paris, 91120 Palaiseau, France
Pseudocode Yes Algorithm 1 Compressed gradient descent (CGD) 1: initialization: x0 Rd, η > 0 2: for k = 0, 1, 2, . . . , N do 3: Agent i [n] compresses fi(xk) and communicates m(i) k := C( fi(xk)) 4: Server updates xk+1 xk η 1 n Pn i=1 m(i) k 5: end for
Open Source Code Yes The source code for all the experiments is publically available in the following Git Hub repository: https://github.com/Daniel Berg Thomsen/error-feedback-tight
Open Datasets No The paper focuses on theoretical analysis using abstract function classes (Fµ,L for smooth, strongly convex functions) rather than specific empirical datasets. No external dataset access information is provided.
Dataset Splits No The paper does not use any specific empirical dataset but rather analyzes optimization methods on a class of theoretical functions (smooth, strongly convex functions). Therefore, dataset splits are not applicable.
Hardware Specification Yes All experiments were run on a Mac Book Pro with an M4 Max processor.
Software Dependencies Yes To do so, the estimation of the worst-case rate is formulated as a semidefinite program (SDP), which is then solved numerically using standard solvers such as MOSEK [57]. ... To arrive at simple and readable proofs, we leverage the computer algebra system of Mathematica [63].
Experiment Setup Yes All contour plots were evaluated over a grid with ϵ [0.01, 0.99], and η [0.01, 2 L+µ ], where µ is the smallest µ specified in the caption of each figure (except for Figure 7, where it is set to 0.1). Each axis was discretized with a resolution of 200 points. To generate each non-cyclic point, we used the following procedure: 1. For each method, we computed the optimal Lyapunov function (without additional constraints) via bisection on the contraction factor ρ, up to a precision of 10-6. 2. Using the resulting Lyapunov function, we then computed the worst-case contraction factor using PEPit [58] and the MOSEK solver [57].