reproducibilityindex.ai

Online to Offline Conversions, Universality and Adaptive Minibatch Sizes

Authors: Kfir Levy

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present an approach towards convex optimization that relies on a novel scheme which converts adaptive online algorithms into ofﬂine methods. In the ofﬂine optimization setting, our derived methods are shown to obtain favourable adaptive guarantees which depend on the harmonic sum of the queried gradients. We further show that our methods implicitly adapt to the objective s structure: in the smooth case fast convergence rates are ensured without any prior knowledge of the smoothness parameter, while still maintaining guarantees in the non-smooth setting. Our approach has a natural extension to the stochastic setting, resulting in a lazy version of SGD (stochastic GD), where minibathces are chosen adaptively depending on the magnitude of the gradients. Thus providing a principled approach towards choosing minibatch sizes. On the technical side, our online to ofﬂine conversion schemes employ three simultaneous mechanisms: an adaptive online algorithm used in conjunction with gradient normalization and with a respective importance weighting. To the best of our knowledge the combination of the above techniques is novel, and we believe it might also ﬁnd use in other scenarios. This paper is organized as follows. In Sections 2,3, we present our methods for the ofﬂine convex/strongly-convex settings. Section 4 describes our methods for the stochastic setting, and Section 5 concludes. Extensions and a preliminary experimental study appear in the Appendix.
Researcher Affiliation	Academia	Kﬁr Y. Levy Department of Computer Science, ETH Zürich. yehuda.levy@inf.ethz.ch
Pseudocode	Yes	Algorithm 1 Adaptive Gradient Descent (Ada Grad) ... Algorithm 2 Adaptive Normalized Gradient Descent (Ada NGDk) ... Algorithm 3 Strongly-Convex Ada NGD (SC-Ada NGDk) ... Algorithm 4 Lazy Stochastic Gradient Descent (Lazy SGD) ... Algorithm 5 Adaptive Estimate (AE)
Open Source Code	No	No explicit statement or link for open-source code release is provided in the paper. It only mentions "Extensions and a preliminary experimental study appear in the Appendix," but does not state that code will be released or provide a link.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and proofs. It does not mention the use of any specific dataset, public or otherwise, for training or evaluation, as no empirical experiments are conducted within the provided main text.
Dataset Splits	No	The paper is theoretical and focuses on algorithm design and proofs. It does not mention any training, validation, or test dataset splits, as it does not conduct empirical experiments in the provided main text.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and proofs. It does not mention any specific hardware specifications used for running experiments, as no empirical experiments are described in the provided main text.
Software Dependencies	No	The paper is theoretical and focuses on algorithm design and proofs. It does not list any specific software dependencies with version numbers, as no empirical experiments are described in the provided main text.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and proofs. It does not provide details on an experimental setup, hyperparameters, or system-level training settings, as no empirical experiments are described in the provided main text.