Anderson Acceleration of Proximal Gradient Methods

Authors: Vien Mai, Mikael Johansson

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical results confirm our theoretical developments. and In this section, we perform experiments to validate our theoretical developments and to demonstrate that despite sharing the same worst-case complexity, SHB can be better in terms of speed and robustness to problem and algorithm parameters than SGD.
Researcher Affiliation Academia 1Division of Decision and Control Systems, EECS, KTH Royal Institute of Technology, Stockholm, Sweden. Correspondence to: V. V. Mai <maivv@kth.se>, M. Johansson <mikaelj@kth.se>.
Pseudocode No xk+1 = argmin x X zk, x xk + 1 2α x xk 2 2 (The algorithm is presented as mathematical equations and steps, not in a structured pseudocode block.)
Open Source Code No The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets No In each experiment, we set m = 300, n = 100 and select x uniformly from the unit sphere. We generate A as A = QD, where Q Rm n is a matrix with standard normal distributed entries, and D is a diagonal matrix with linearly spaced elements between 1/κ and 1, with κ 1 playing the role of a condition number. The elements bi of the vector b are generated as bi = ai, x 2 + δζi, i = 1, . . . , m, where ζi N(0, 25) models the corruptions, and δ {0, 1} is a binary random variable taking the value 1 with probability pfail, so that pfail m measurements are corrupted. (The paper describes how the dataset was generated for the experiments but does not provide any access information or state its public availability.)
Dataset Splits No The paper describes the problem as a stochastic optimization problem where the loss is parameterized by x on a sample s S. It mentions using 'i.i.d. samples S0, S1, . . . , SK', but does not specify traditional train/validation/test dataset splits, as the experiments involve generating data for a single optimization problem rather than training a model on a partitioned dataset.
Hardware Specification No The paper discusses numerical evaluations but does not provide any specific details about the hardware used to run the experiments.
Software Dependencies No The paper mentions 'Py Torch' in a footnote related to default momentum values, but it does not specify any software dependencies with version numbers for its experimental setup.
Experiment Setup Yes In each experiment, we set m = 300, n = 100 and select x uniformly from the unit sphere. We generate A as A = QD... In each of our experiments, we set the stepsize as αk = α0/ k + 1, where α0 is an initial stepsize... Within each individual run, we allow the considered stochastic methods to perform K = 400m iterations. We conduct 50 experiments for each stepsize and report the median of the epochs-to-ϵ-accuracy.