Anderson Acceleration of Proximal Gradient Methods
Authors: Vien Mai, Mikael Johansson
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results confirm our theoretical developments. and In this section, we perform experiments to validate our theoretical developments and to demonstrate that despite sharing the same worst-case complexity, SHB can be better in terms of speed and robustness to problem and algorithm parameters than SGD. |
| Researcher Affiliation | Academia | 1Division of Decision and Control Systems, EECS, KTH Royal Institute of Technology, Stockholm, Sweden. Correspondence to: V. V. Mai <maivv@kth.se>, M. Johansson <mikaelj@kth.se>. |
| Pseudocode | No | xk+1 = argmin x X zk, x xk + 1 2α x xk 2 2 (The algorithm is presented as mathematical equations and steps, not in a structured pseudocode block.) |
| Open Source Code | No | The paper does not provide any statement or link regarding the public availability of its source code. |
| Open Datasets | No | In each experiment, we set m = 300, n = 100 and select x uniformly from the unit sphere. We generate A as A = QD, where Q Rm n is a matrix with standard normal distributed entries, and D is a diagonal matrix with linearly spaced elements between 1/κ and 1, with κ 1 playing the role of a condition number. The elements bi of the vector b are generated as bi = ai, x 2 + δζi, i = 1, . . . , m, where ζi N(0, 25) models the corruptions, and δ {0, 1} is a binary random variable taking the value 1 with probability pfail, so that pfail m measurements are corrupted. (The paper describes how the dataset was generated for the experiments but does not provide any access information or state its public availability.) |
| Dataset Splits | No | The paper describes the problem as a stochastic optimization problem where the loss is parameterized by x on a sample s S. It mentions using 'i.i.d. samples S0, S1, . . . , SK', but does not specify traditional train/validation/test dataset splits, as the experiments involve generating data for a single optimization problem rather than training a model on a partitioned dataset. |
| Hardware Specification | No | The paper discusses numerical evaluations but does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch' in a footnote related to default momentum values, but it does not specify any software dependencies with version numbers for its experimental setup. |
| Experiment Setup | Yes | In each experiment, we set m = 300, n = 100 and select x uniformly from the unit sphere. We generate A as A = QD... In each of our experiments, we set the stepsize as αk = α0/ k + 1, where α0 is an initial stepsize... Within each individual run, we allow the considered stochastic methods to perform K = 400m iterations. We conduct 50 experiments for each stepsize and report the median of the epochs-to-ϵ-accuracy. |