Online to Offline Conversions, Universality and Adaptive Minibatch Sizes
Authors: Kfir Levy
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We present an approach towards convex optimization that relies on a novel scheme which converts adaptive online algorithms into offline methods. In the offline optimization setting, our derived methods are shown to obtain favourable adaptive guarantees which depend on the harmonic sum of the queried gradients. We further show that our methods implicitly adapt to the objective s structure: in the smooth case fast convergence rates are ensured without any prior knowledge of the smoothness parameter, while still maintaining guarantees in the non-smooth setting. Our approach has a natural extension to the stochastic setting, resulting in a lazy version of SGD (stochastic GD), where minibathces are chosen adaptively depending on the magnitude of the gradients. Thus providing a principled approach towards choosing minibatch sizes. On the technical side, our online to offline conversion schemes employ three simultaneous mechanisms: an adaptive online algorithm used in conjunction with gradient normalization and with a respective importance weighting. To the best of our knowledge the combination of the above techniques is novel, and we believe it might also find use in other scenarios. This paper is organized as follows. In Sections 2,3, we present our methods for the offline convex/strongly-convex settings. Section 4 describes our methods for the stochastic setting, and Section 5 concludes. Extensions and a preliminary experimental study appear in the Appendix. |
| Researcher Affiliation | Academia | Kfir Y. Levy Department of Computer Science, ETH Zürich. yehuda.levy@inf.ethz.ch |
| Pseudocode | Yes | Algorithm 1 Adaptive Gradient Descent (Ada Grad) ... Algorithm 2 Adaptive Normalized Gradient Descent (Ada NGDk) ... Algorithm 3 Strongly-Convex Ada NGD (SC-Ada NGDk) ... Algorithm 4 Lazy Stochastic Gradient Descent (Lazy SGD) ... Algorithm 5 Adaptive Estimate (AE) |
| Open Source Code | No | No explicit statement or link for open-source code release is provided in the paper. It only mentions "Extensions and a preliminary experimental study appear in the Appendix," but does not state that code will be released or provide a link. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and proofs. It does not mention the use of any specific dataset, public or otherwise, for training or evaluation, as no empirical experiments are conducted within the provided main text. |
| Dataset Splits | No | The paper is theoretical and focuses on algorithm design and proofs. It does not mention any training, validation, or test dataset splits, as it does not conduct empirical experiments in the provided main text. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and proofs. It does not mention any specific hardware specifications used for running experiments, as no empirical experiments are described in the provided main text. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and proofs. It does not list any specific software dependencies with version numbers, as no empirical experiments are described in the provided main text. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and proofs. It does not provide details on an experimental setup, hyperparameters, or system-level training settings, as no empirical experiments are described in the provided main text. |