Algorithmic Instabilities of Accelerated Gradient Descent
Authors: Amit Attia, Tomer Koren
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We disprove this conjecture and show, for two notions of algorithmic stability (including uniform stability), that the stability of Nesterov s accelerated method in fact deteriorates exponentially fast with the number of gradient steps. This stands in sharp contrast to the bounds in the quadratic case, but also to known results for non-accelerated gradient methods where stability typically grows linearly with the number of steps. |
| Researcher Affiliation | Collaboration | Amit Attia Blavatnik School of Computer Science Tel Aviv University amitattia@mail.tau.ac.il Tomer Koren Blavatnik School of Computer Science Tel Aviv University, and Google Research tkoren@tauex.tau.ac.il |
| Pseudocode | No | The paper describes the Nesterov Accelerated Gradient method using mathematical equations (1) and (2) and derivations, but it does not provide a formal pseudocode block or algorithm listing. |
| Open Source Code | No | The paper does not include any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper constructs synthetic functions and samples for its theoretical analysis, stating: "We will use the construction from Section 3 with the properties of Theorem 3 in order to create a loss function and samples which will have the same optimization for 푡 1." It does not use or provide access to any publicly available datasets. |
| Dataset Splits | No | The paper focuses on theoretical analysis and does not involve empirical training or validation on datasets with explicit splits. Therefore, no train/validation/test dataset splits are provided. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for any computations or simulations. |
| Software Dependencies | No | The paper does not list any specific software components or libraries with their version numbers that would be required to reproduce the work. |
| Experiment Setup | No | The paper defines mathematical parameters (e.g., 퐺, β, η, ϵ) for its theoretical constructions, but these are not experimental hyperparameters or system-level training settings for a practical implementation. |