Online Adaptive Methods, Universality and Acceleration

Authors: Kfir Y. Levy, Alp Yurtsever, Volkan Cevher

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An empirical examination of our method demonstrates its applicability to the above mentioned scenarios and corroborates our theoretical findings. ... In Section 5 we present our empirical study ... Figure 1: Comparison of universal methods at a smooth (top) and a non-smooth (bottom) problem.
Researcher Affiliation Academia Kfir Y. Levy ETH Zurich yehuda.levy@inf.ethz.ch Alp Yurtsever EPFL alp.yurtsever@epfl.ch Volkan Cevher EPFL volkan.cevher@epfl.ch
Pseudocode Yes Algorithm 1 Adaptive Gradient Method (Ada Grad) ... Algorithm 2 Accelerated Adaptive Gradient Method (Accele Grad)
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets No We synthetically generate matrix A Rn d and a point of interest x Rd randomly, with entries independently drawn from standard Gaussian distribution. ... In the appendix we show results on a real dataset which demonstrate the appeal of Accele Grad in the large-minibatch regime. (No specific dataset name, link, or citation for public access is provided for the real dataset, and the primary data is synthetic.)
Dataset Splits No The paper does not explicitly describe training, validation, and test dataset splits with percentages, sample counts, or citations to predefined splits.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory, or specific machine types).
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, etc.).
Experiment Setup Yes All methods are initialized at the origin, and we choose K as the ℓ2 norm ball of diameter D. ... The parameter ρ denotes the ratio between D/2 and the distance between initial point and the solution. Parameter D plays a major role on the step-size of Ada Grad and Accele Grad.