Riemannian stochastic optimization methods avoid strict saddle points

Authors: Ya-Ping Hsieh, Mohammad Reza Karimi Jaghargh, Andreas Krause, Panayotis Mertikopoulos

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we aim to demonstrate the practical applicability of the theoretical framework proposed in our paper by providing numerical illustrations. To do so, we utilize a 2-dimensional torus as the optimization landscape, where the complexity and multi-modal nature of the objective function can be easily visualized in Fig. 2. Our objective function features three saddle points (in black) and one global minimizer (in red). We subject two RRM schemes, i.e., (RSGD) and (RSEG), to initialization in proximity to these saddle points. This strategic choice rigorously tests their ability to navigate and converge to the global optimum. In line with our theoretical predictions, both RRM methods avoid the saddle points and eventually reach the desirable global minimum. This empirical confirmation reinforces the central message of our paper: stochastic Riemannian schemes only converge to local minimizers.
Researcher Affiliation Academia Ya-Ping Hsieh, Mohammad Reza Karimi, Andreas Krause ETH Zürich {yaping.hsieh, mkarimi}@inf.ethz.ch, krausea@ethz.ch Panayotis Mertikopoulos Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG & Archimedes RU, NKUA panayotis.mertikopoulos@imag.fr
Pseudocode Yes Algorithm 1 (Riemannian stochastic gradient descent). Algorithm 2 (Retraction-based stochastic gradient descent). Algorithm 3 (Stochastic mirror descent). Algorithm 4 (Riemannian optimistic gradient). Algorithm 5 (Natural gradient descent). Algorithm 6 (Riemannian proximal point methods). Algorithm 7 (Riemannian stochastic extra-gradient).
Open Source Code No The paper does not contain any explicit statement about releasing source code for the methods described, nor does it provide a link to a code repository.
Open Datasets No The paper uses a "2-dimensional torus as the optimization landscape" for numerical illustrations, which is a synthetic landscape, not a publicly available dataset with concrete access information (e.g., URL, DOI, citation).
Dataset Splits No The paper describes numerical illustrations on a "2-dimensional torus as the optimization landscape" and does not specify any training, validation, or test splits for data.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications) used for running the numerical illustrations.
Software Dependencies No The paper does not provide specific software dependencies (e.g., library names with version numbers like PyTorch 1.9 or Python 3.8) used for the numerical illustrations.
Experiment Setup No The paper states: "We subject two RRM schemes, i.e., (RSGD) and (RSEG), to initialization in proximity to these saddle points." This is a minimal detail about the experimental setup but lacks concrete hyperparameters or other system-level training settings.