reproducibilityindex.ai

Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions

Authors: Ashia C. Wilson, Lester Mackey, Andre Wibisono

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide several examples of strongly smooth loss functions in machine learning and numerical experiments that verify our theoretical ﬁndings.
Researcher Affiliation	Collaboration	Ashia C. Wilson Microsoft Research ashia.wilson@microsoft.com Lester Mackey Microsoft Research lmackey@microsoft.com Andre Wibisono Georgia Tech wibisono@gatech.edu
Pseudocode	Yes	Algorithm 1 Nesterov-style accelerated rescaled gradient descent. Require: f satisﬁes (13) and h satisﬁes Dh(x, y) 1 p x y p. 1: Set x0 = z0, Ak = (δ/p)pk(p), αk = Ak+1 Ak δ , τk = αk Ak+1 , and δ p p 1 = η 1 p 1 /2. 2: for k = 1, . . . , K do 3: xk = δτkzk + (1 δτk)yk 4: zk+1 = arg minz X αk f(xk), z + 1 δ Dh(z, zk) 5: yk+1 = xk η 1 p 1 B 1 f(xk)/ f(xk) p 2 p 1 6: return y K.
Open Source Code	Yes	The code for these experiments can be found here: https://github.com/aswilson07/ARGD.git.
Open Datasets	No	For the logistic and ℓ4 losses, we use the same code, plots, and experimental methodology of Zhang et al. [36] (including data and step-size choice), adding to it (A)RGD. The paper mentions using data from [36] but does not provide direct access information (link, DOI, repository, or explicit citation for the dataset itself).
Dataset Splits	No	The paper describes the data generation process but does not provide specific details on training, validation, or test dataset splits or how data was partitioned for experiments.
Hardware Specification	No	The paper describes numerical experiments but does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running them.
Software Dependencies	No	The paper mentions that code is available on GitHub but does not explicitly list software dependencies with specific version numbers within the text.
Experiment Setup	No	The paper mentions step-size choices and constraints (e.g., 'largest step-size was chosen subject to the algorithm not diverging'), but it does not provide specific numerical values for hyperparameters or other detailed system-level training settings used in the experiments.