Necessary and Sufficient Geometries for Gradient Methods

Authors: Daniel Levy, John C. Duchi

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the impact of the constraint set and gradient geometry on the convergence of online and stochastic methods for convex optimization, providing a characterization of the geometries for which stochastic gradient and adaptive gradient methods are (minimax) optimal. In particular, we show that when the constraint set is quadratically convex, diagonally pre-conditioned stochastic gradient methods are minimax optimal. We further provide a converse that shows that when the constraints are not quadratically convex for example, any p-ball for p < 2 the methods are far from optimal. Based on this, we can provide concrete recommendations for when one should use adaptive, mirror or stochastic gradient methods.
Researcher Affiliation Academia Daniel Levy Stanford University danilevy@stanford.edu John C. Duchi Stanford University jduchi@stanford.edu
Pseudocode No The paper describes algorithms mathematically (e.g., mirror descent updates), but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not mention or provide access to any open-source code for the methodology described.
Open Datasets No This is a theoretical paper focused on mathematical proofs and analyses, not empirical studies involving datasets or training. Therefore, no information about publicly available training datasets is provided.
Dataset Splits No This is a theoretical paper and does not involve experimental validation with dataset splits. No specific dataset split information (percentages, sample counts, or citations to predefined splits) is provided.
Hardware Specification No This is a theoretical paper and does not describe any experiments that would require hardware. Therefore, no hardware specifications are provided.
Software Dependencies No This is a theoretical paper and does not mention any software dependencies with specific version numbers.
Experiment Setup No This is a theoretical paper that does not describe empirical experiments with hyperparameters or training configurations. Therefore, no experimental setup details are provided.