Convergence of adaptive algorithms for constrained weakly convex optimization
Authors: Ahmet Alacaoglu, Yura Malitsky, Volkan Cevher
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we illustrate the applications and extensions of our results to specific problems and algorithms. ... Finally, in a numerical experiment for robust phase retrieval, we observe that AMSGrad is more robust to variation of initial step sizes, compared to SGD and SGD with momentum. ... 5 Numerical experiment This section illustrates the potential advantages of adaptive algorithms, in particular AMSGrad, for solving a prototypical weakly convex problem, compared to SGD and SGD with momentum (also referred in [26] as stochastic heavy ball: SHB). |
| Researcher Affiliation | Academia | Ahmet Alacaoglu University of Wisconsin-Madison alacaoglu@wisc.edu Yura Malitsky Linköping University yurii.malitskyi@liu.se Volkan Cevher EPFL volkan.cevher@epfl.ch |
| Pseudocode | Yes | Algorithm 1 AMSGrad [28] |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | No | The paper describes its data generation process: 'The data is generated as A = QD, with a standard normal distributed matrix Q ∈ Rn×d and D = linspace(1/κ, 1, d)... We generate the solution x∗ as a standard normal random vector with unit norm. The observations are generated as b = Ax∗ + δη, where elements of η ∈ Rn have distribution N(0, 25) and δ = diag(δ1, . . . , δd) is such that |{i ∈ [n] : δi = 1}|/n = 0.2...'. While it cites papers for the robust phase retrieval problem, it does not provide concrete access information (link, DOI, repository, or standard name) for the specific dataset used in their numerical experiments. |
| Dataset Splits | No | The paper describes data generation for its numerical experiment but does not provide any specific information about training, validation, or test dataset splits (percentages, sample counts, or explicit partitioning methodology). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processors, or memory used for running its experiments. |
| Software Dependencies | No | The paper does not list any specific software components with version numbers, such as programming languages, libraries, or solvers, used in the experiments. |
| Experiment Setup | Yes | For all algorithms, the step size is chosen as αk = α0/√k. We varied the initial step size2 between 0.01 and 10 for all algorithms, and we plotted the number of epochs that the algorithms take to reach to f(x) − f∗ < 0.1. In terms of other algorithmic parameters, we use both β = 0.1 and β = 0.01 for SHB, as in in [26] and β1 = β2 = 0.99 as popular, for AMSGrad. |