Necessary and Sufficient Geometries for Gradient Methods
Authors: Daniel Levy, John C. Duchi
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the impact of the constraint set and gradient geometry on the convergence of online and stochastic methods for convex optimization, providing a characterization of the geometries for which stochastic gradient and adaptive gradient methods are (minimax) optimal. In particular, we show that when the constraint set is quadratically convex, diagonally pre-conditioned stochastic gradient methods are minimax optimal. We further provide a converse that shows that when the constraints are not quadratically convex for example, any p-ball for p < 2 the methods are far from optimal. Based on this, we can provide concrete recommendations for when one should use adaptive, mirror or stochastic gradient methods. |
| Researcher Affiliation | Academia | Daniel Levy Stanford University danilevy@stanford.edu John C. Duchi Stanford University jduchi@stanford.edu |
| Pseudocode | No | The paper describes algorithms mathematically (e.g., mirror descent updates), but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not mention or provide access to any open-source code for the methodology described. |
| Open Datasets | No | This is a theoretical paper focused on mathematical proofs and analyses, not empirical studies involving datasets or training. Therefore, no information about publicly available training datasets is provided. |
| Dataset Splits | No | This is a theoretical paper and does not involve experimental validation with dataset splits. No specific dataset split information (percentages, sample counts, or citations to predefined splits) is provided. |
| Hardware Specification | No | This is a theoretical paper and does not describe any experiments that would require hardware. Therefore, no hardware specifications are provided. |
| Software Dependencies | No | This is a theoretical paper and does not mention any software dependencies with specific version numbers. |
| Experiment Setup | No | This is a theoretical paper that does not describe empirical experiments with hyperparameters or training configurations. Therefore, no experimental setup details are provided. |