Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems
Authors: Stefano Sarao Mannelli, Pierfrancesco Urbani
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we use dynamical mean field theory techniques to describe analytically the average dynamics of these methods in a prototypical non-convex model: the (spiked) matrix-tensor model. We derive a closed set of equations that describe the behaviour of heavy-ball momentum and Nesterov acceleration in the infinite dimensional limit. By numerical integration of these equations, we observe that these methods speed up the dynamics but do not improve the algorithmic threshold with respect to gradient descent in the spiked model.Our non-rigorous results are checked against extensive numerical simulations. |
| Researcher Affiliation | Academia | Stefano Sarao Mannelli Department of Experimental Psychology University of Oxford Oxford, United Kingdom stefano.saraomannelli@psy.ox.ac.uk Pierfrancesco Urbani Universit e Paris-Saclay, CNRS, CEA Institut de physique th eorique Gif-sur-Yvette, France pierfrancesco.urbani@ipht.fr |
| Pseudocode | No | The paper defines the algorithms (Nesterov acceleration, Polyak's or heavy ball momentum, gradient descent) using mathematical equations (3-7), but does not present them in a pseudocode or algorithm block format. |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] We use standard algorithms and provide details on the parameters. They can be easily reproduce in any computer. |
| Open Datasets | No | The paper defines and uses generative models (mixed p-spin model and spiked matrix-tensor model) for its analysis and simulations. It does not refer to or provide access to pre-existing, publicly available datasets. |
| Dataset Splits | No | The paper conducts numerical simulations on generative models and provides simulation parameters. However, it does not specify explicit training, validation, or test dataset splits in the conventional sense of fixed datasets. |
| Hardware Specification | No | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No] Tha data for each figure can be obtained in maximum one day of laptop simulation. |
| Software Dependencies | No | The paper provides model and algorithm parameters for the simulations (e.g., p=3, alpha=0.01, beta=0.9) but does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or specific solvers). |
| Experiment Setup | Yes | All the figures resulting from the experiments contain details to reproduce them. For example, Figure 1 specifies: 'The simulations in the figures have parameters p = 3, σ3 = 2/p, σ2 = 1, ridge parameter µ = 10 and input dimension N = 1024. ... heavy ball momentum in blue with α = 0.01 and β = 0.9;' |