Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions
Authors: Ashia C. Wilson, Lester Mackey, Andre Wibisono
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide several examples of strongly smooth loss functions in machine learning and numerical experiments that verify our theoretical findings. |
| Researcher Affiliation | Collaboration | Ashia C. Wilson Microsoft Research ashia.wilson@microsoft.com Lester Mackey Microsoft Research lmackey@microsoft.com Andre Wibisono Georgia Tech wibisono@gatech.edu |
| Pseudocode | Yes | Algorithm 1 Nesterov-style accelerated rescaled gradient descent. Require: f satisfies (13) and h satisfies Dh(x, y) 1 p x y p. 1: Set x0 = z0, Ak = (δ/p)pk(p), αk = Ak+1 Ak δ , τk = αk Ak+1 , and δ p p 1 = η 1 p 1 /2. 2: for k = 1, . . . , K do 3: xk = δτkzk + (1 δτk)yk 4: zk+1 = arg minz X αk f(xk), z + 1 δ Dh(z, zk) 5: yk+1 = xk η 1 p 1 B 1 f(xk)/ f(xk) p 2 p 1 6: return y K. |
| Open Source Code | Yes | The code for these experiments can be found here: https://github.com/aswilson07/ARGD.git. |
| Open Datasets | No | For the logistic and ℓ4 losses, we use the same code, plots, and experimental methodology of Zhang et al. [36] (including data and step-size choice), adding to it (A)RGD. The paper mentions using data from [36] but does not provide direct access information (link, DOI, repository, or explicit citation for the dataset itself). |
| Dataset Splits | No | The paper describes the data generation process but does not provide specific details on training, validation, or test dataset splits or how data was partitioned for experiments. |
| Hardware Specification | No | The paper describes numerical experiments but does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running them. |
| Software Dependencies | No | The paper mentions that code is available on GitHub but does not explicitly list software dependencies with specific version numbers within the text. |
| Experiment Setup | No | The paper mentions step-size choices and constraints (e.g., 'largest step-size was chosen subject to the algorithm not diverging'), but it does not provide specific numerical values for hyperparameters or other detailed system-level training settings used in the experiments. |