Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions

Authors: Ashia C. Wilson, Lester Mackey, Andre Wibisono

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide several examples of strongly smooth loss functions in machine learning and numerical experiments that verify our theoretical findings.
Researcher Affiliation Collaboration Ashia C. Wilson Microsoft Research ashia.wilson@microsoft.com Lester Mackey Microsoft Research lmackey@microsoft.com Andre Wibisono Georgia Tech wibisono@gatech.edu
Pseudocode Yes Algorithm 1 Nesterov-style accelerated rescaled gradient descent. Require: f satisfies (13) and h satisfies Dh(x, y) 1 p x y p. 1: Set x0 = z0, Ak = (δ/p)pk(p), αk = Ak+1 Ak δ , τk = αk Ak+1 , and δ p p 1 = η 1 p 1 /2. 2: for k = 1, . . . , K do 3: xk = δτkzk + (1 δτk)yk 4: zk+1 = arg minz X αk f(xk), z + 1 δ Dh(z, zk) 5: yk+1 = xk η 1 p 1 B 1 f(xk)/ f(xk) p 2 p 1 6: return y K.
Open Source Code Yes The code for these experiments can be found here: https://github.com/aswilson07/ARGD.git.
Open Datasets No For the logistic and ℓ4 losses, we use the same code, plots, and experimental methodology of Zhang et al. [36] (including data and step-size choice), adding to it (A)RGD. The paper mentions using data from [36] but does not provide direct access information (link, DOI, repository, or explicit citation for the dataset itself).
Dataset Splits No The paper describes the data generation process but does not provide specific details on training, validation, or test dataset splits or how data was partitioned for experiments.
Hardware Specification No The paper describes numerical experiments but does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running them.
Software Dependencies No The paper mentions that code is available on GitHub but does not explicitly list software dependencies with specific version numbers within the text.
Experiment Setup No The paper mentions step-size choices and constraints (e.g., 'largest step-size was chosen subject to the algorithm not diverging'), but it does not provide specific numerical values for hyperparameters or other detailed system-level training settings used in the experiments.