A simpler approach to accelerated optimization: iterative averaging meets optimism

Authors: Pooria Joulani, Anant Raj, Andras Gyorgy, Csaba Szepesvari

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we show that there is a simpler approach to acceleration: applying optimistic online learning algorithms and querying the gradient oracle at the online average of the intermediate optimization iterates. In particular, we tighten a recent result of Cutkosky (2019) to demonstrate theoretically that online iterate averaging results in a reduced optimization gap, independently of the algorithm involved. We show that carefully combining this technique with existing generic optimistic online learning algorithms yields the optimal accelerated rates for optimizing strongly-convex and non-strongly-convex, possibly composite objectives, with deterministic as well as stochastic first-order oracles. We further extend this idea to variance-reduced optimization. Finally, we also provide universal algorithms that achieve the optimal rate for smooth and non-smooth composite objectives simultaneously without further tuning, generalizing the results of Kavis et al. (2019) and solving a number of their open problems.
Researcher Affiliation Collaboration 1Deep Mind, London, UK 2Max-Planck Institute for Intelligent Systems, T ubingen, Germany; work done during an internship at Deepmind 3University of Alberta, Edmonton, AB, Canada.
Pseudocode Yes Algorithm 1 Vanilla Online-to-Batch, Algorithm 2 Anytime Online-to-Batch (Cutkosky, 2019), Algorithm 3 Variance-Reduced Anytime Online-to-Batch with Negative Momentum
Open Source Code No No explicit statement about providing open-source code or a link to a code repository was found.
Open Datasets No The paper is theoretical and focuses on algorithm design and convergence proofs. It does not mention the use of any specific public or open datasets for training, nor does it provide access information for such.
Dataset Splits No The paper is theoretical and does not conduct experiments, therefore, it does not specify training, validation, or test dataset splits.
Hardware Specification No No specific hardware details (such as GPU/CPU models, processors, or cloud resources) used for running experiments were mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers were mentioned that would be necessary to replicate any computational work.
Experiment Setup No The paper is theoretical and does not describe experimental setup details such as hyperparameter values, model initialization, or training configurations.