Rethinking the Variational Interpretation of Accelerated Optimization Methods
Authors: Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | The continuous-time model of Nesterov s momentum provides a thought-provoking perspective for understanding the nature of the acceleration phenomenon in convex optimization. One of the main ideas in this line of research comes from the field of classical mechanics and proposes to link Nesterov s trajectory to the solution of a set of Euler-Lagrange equations relative to the so-called Bregman Lagrangian. In this work, we revisit this idea and provide an in-depth analysis of the action relative to the Bregman Lagrangian from the point of view of calculus of variations. Our main finding is that, while Nesterov s method is a stationary point for the action, it is often not a minimizer but instead a saddle point for this functional in the space of differentiable curves. This finding challenges the main intuition behind the variational interpretation of Nesterov s method and provides additional insights into the intriguing geometry of accelerated paths. |
| Researcher Affiliation | Academia | Peiyuan Zhang ETH Zurich talantyeri@gmail.com Antonio Orvieto ETH Zurich aorvieto@ethz.ch Hadi Daneshmand Inria Paris seyed.daneshmand@inria.fr |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link for open-source code related to the methodology described. |
| Open Datasets | No | This is a theoretical paper that does not involve empirical experiments with datasets. The figures are illustrative based on mathematical functions, not data from a trained model. |
| Dataset Splits | No | This is a theoretical paper that does not involve empirical experiments with datasets or data splitting. |
| Hardware Specification | No | This is a theoretical paper and does not mention any specific hardware used for experiments. It mentions 'Maple/Mathematica' for symbolic computation and 'Matlab' for numerical simulations, but no hardware. |
| Software Dependencies | No | The paper mentions 'Symbolic computations are checked in Maple/Mathematica, numerical simulations are performed in Matlab.' However, it does not specify version numbers for these software, nor does it list other dependencies. |
| Experiment Setup | No | This is a theoretical paper and does not describe an experimental setup with hyperparameters or system-level training settings. Figure 3 mentions 'Simulation with Runge-Kutta 4 integration', which is a numerical method for solving ODEs, not an experimental setup for a model. |