Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion

Authors: Ashok Cutkosky, Harsh Mehta, Francesco Orabona

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We present new algorithms for optimizing nonsmooth, non-convex stochastic objectives based on a novel analysis technique. This improves the current best-known complexity for finding a (δ, ϵ)stationary point from O(ϵ 4δ 1) stochastic gradient queries to O(ϵ 3δ 1), which we also show to be optimal. Our primary technique is a reduction from non-smooth non-convex optimization to online learning, after which our results follow from standard regret bounds in online learning. For deterministic and second-order smooth objectives, applying more advanced optimistic online learning techniques enables a new complexity of O(ϵ 1.5δ 0.5).
Researcher Affiliation Collaboration 1Department of Electrical & Computer Engineering, Boston University, Boston, MA, USA 2Google Research, Mountain View, CA, USA. Correspondence to: Ashok Cutkosky <ashok@cutkosky.com>, Harsh Mehta <harshm@google.com>, Francesco Orabona <francesco@orabona.com>.
Pseudocode Yes Algorithm 1 Online-to-Non-Convex Conversion
Open Source Code No The paper does not contain any explicit statement about releasing the source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets No The paper is theoretical and focuses on algorithm complexity and proofs. It does not conduct experiments on specific datasets, therefore, it does not mention or provide access information for any publicly available or open datasets for training.
Dataset Splits No This is a theoretical paper that focuses on mathematical analysis and algorithm complexity. It does not involve empirical experiments with datasets, and therefore, it does not provide details on training, validation, or test dataset splits.
Hardware Specification No The paper focuses on theoretical analysis and algorithm complexity. It does not describe any empirical experiments that would require specific hardware specifications for their execution.
Software Dependencies No The paper is theoretical and outlines algorithms and proofs. It does not mention any specific software dependencies with version numbers that would be required to replicate empirical experiments.
Experiment Setup No The paper is theoretical and does not describe empirical experiments. Consequently, it does not provide specific details about experimental setup, hyperparameters, or system-level training settings.