Convergence of adaptive algorithms for constrained weakly convex optimization

Authors: Ahmet Alacaoglu, Yura Malitsky, Volkan Cevher

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we illustrate the applications and extensions of our results to specific problems and algorithms. ... Finally, in a numerical experiment for robust phase retrieval, we observe that AMSGrad is more robust to variation of initial step sizes, compared to SGD and SGD with momentum. ... 5 Numerical experiment This section illustrates the potential advantages of adaptive algorithms, in particular AMSGrad, for solving a prototypical weakly convex problem, compared to SGD and SGD with momentum (also referred in [26] as stochastic heavy ball: SHB).
Researcher Affiliation Academia Ahmet Alacaoglu University of Wisconsin-Madison alacaoglu@wisc.edu Yura Malitsky Linköping University yurii.malitskyi@liu.se Volkan Cevher EPFL volkan.cevher@epfl.ch
Pseudocode Yes Algorithm 1 AMSGrad [28]
Open Source Code No The paper does not provide any explicit statements about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets No The paper describes its data generation process: 'The data is generated as A = QD, with a standard normal distributed matrix Q ∈ Rn×d and D = linspace(1/κ, 1, d)... We generate the solution x∗ as a standard normal random vector with unit norm. The observations are generated as b = Ax∗ + δη, where elements of η ∈ Rn have distribution N(0, 25) and δ = diag(δ1, . . . , δd) is such that |{i ∈ [n] : δi = 1}|/n = 0.2...'. While it cites papers for the robust phase retrieval problem, it does not provide concrete access information (link, DOI, repository, or standard name) for the specific dataset used in their numerical experiments.
Dataset Splits No The paper describes data generation for its numerical experiment but does not provide any specific information about training, validation, or test dataset splits (percentages, sample counts, or explicit partitioning methodology).
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, processors, or memory used for running its experiments.
Software Dependencies No The paper does not list any specific software components with version numbers, such as programming languages, libraries, or solvers, used in the experiments.
Experiment Setup Yes For all algorithms, the step size is chosen as αk = α0/√k. We varied the initial step size2 between 0.01 and 10 for all algorithms, and we plotted the number of epochs that the algorithms take to reach to f(x) − f∗ < 0.1. In terms of other algorithmic parameters, we use both β = 0.1 and β = 0.01 for SHB, as in in [26] and β1 = β2 = 0.99 as popular, for AMSGrad.