Complexity of Highly Parallel Non-Smooth Convex Optimization
Authors: Sebastien Bubeck, Qijia Jiang, Yin-Tat Lee, Yuanzhi Li, Aaron Sidford
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | A landmark result of non-smooth convex optimization is that gradient descent is an optimal algorithm whenever the number of computed gradients is smaller than the dimension d. In this paper we study the extension of this result to the parallel optimization setting. Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer poly(d) gradient queries in parallel. We show that in this case gradient descent is optimal only up to e O(d) rounds of interactions with the oracle. The lower bound improves upon a decades old construction by Nemirovski which proves optimality only up to d1/3 rounds (as recently observed by Balkanski and Singer), and the suboptimality of gradient descent after d rounds was already observed by Duchi, Bartlett and Wainwright. In the latter regime we propose a new method with improved complexity, which we conjecture to be optimal. The analysis of this new method is based upon a generalized version of the recent results on optimal acceleration for highly smooth convex optimization. |
| Researcher Affiliation | Collaboration | Sébastien Bubeck Microsoft Research sebubeck@microsoft.com Qijia Jiang Stanford University qjiang2@stanford.edu Yin Tat Lee University of Washington & Microsoft Research yintat@uw.edu Yuanzhi Li Stanford University yuanzhil@stanford.edu Aaron Sidford Stanford University sidford@stanford.edu |
| Pseudocode | Yes | Algorithm 1: Compute vector field approximating rg Algorithm 2: Approximate minimization of g(y) + Φ(ky ck) |
| Open Source Code | No | The paper is theoretical and does not contain any explicit statements or links indicating that open-source code for the described methodology is provided or made available. |
| Open Datasets | No | The paper describes theoretical results and algorithms. It refers to abstract functions like a 'random function f' or 'convolved function' for theoretical analysis and proofs, but does not mention the use of any specific real-world or publicly available datasets with access information (link, DOI, or formal citation). |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments on datasets, thus it does not provide any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and complexity. It does not provide any specific details about the hardware used, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers that would be needed to replicate experiments. |
| Experiment Setup | No | The paper is theoretical and does not involve empirical experiments. Therefore, it does not provide details on experimental setup, such as hyperparameters, training configurations, or system-level settings. |