On the convergence of single-call stochastic extra-gradient methods

Authors: Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 1: Illustration of the performance of EG and 1-EG in the (a priori non-monotone) saddle-point problem L(θ, φ) = 2ϵ1θ A1θ + ϵ2 θ A2θ 2 2ϵ1φ B1φ ϵ2 φ B2φ 2 + 4θ Cφ on the full unconstrained space X = Rd = Rd1 d2 with d1 = d2 = 1000 and A1, B1, A2, B2 0. We choose three situations representative of the settings considered in the paper: (a) linear convergence of the last iterate of the deterministic methods in strongly monotone problems; (b) the O(1/t) convergence of the ergodic average in monotone, deterministic problems; and (c) the O(1/t) local convergence rate of the method s last iterate in stochastic, non-monotone problems. For (a) and (b), the origin is the unique solution of (VI), and for (c) it is a regular solution thereof. We observe that 1-EG consistently outperforms EG in terms of oracle calls for a fixed step-size, and the observed rates are consistent with the rates reported in Table 1.
Researcher Affiliation Academia Yu-Guan Hsieh Univ. Grenoble Alpes, LJK and ENS Paris 38000 Grenoble, France. yu-guan.hsieh@ens.fr Franck Iutzeler Univ. Grenoble Alpes, LJK 38000 Grenoble, France. franck.iutzeler@univ-grenoble-alpes.fr Jérôme Malick CNRS, LJK 38000 Grenoble, France. jerome.malick@univ-grenoble-alpes.fr Panayotis Mertikopoulos Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG 38000 Grenoble, France. panayotis.mertikopoulos@imag.fr
Pseudocode Yes The Extra-Gradient algorithm. In the general framework outlined in the previous section, the Extra-Gradient (EG) algorithm of Korpelevich [23] can be stated in recursive form as Xt+1/2 = ΠX (Xt γt Vt) Xt+1 = ΠX (Xt γt Vt+1/2) (EG) ... Single-call variants of the Extra-Gradient algorithm. Concretely, we will examine the following family of single-call extra-gradient (1-EG) algorithms: 1. Past Extra-Gradient (PEG) [10, 18, 37]: Xt+1/2 = ΠX (Xt γt Vt 1/2) Xt+1 = ΠX (Xt γt Vt+1/2) (PEG) ... 2. Reflected Gradient (RG) [8, 13, 25]: Xt+1/2 = Xt (Xt 1 Xt) Xt+1 = ΠX (Xt γt Vt+1/2) (RG) ... 3. Optimistic Gradient (OG) [14, 30, 31, 36]: Xt+1/2 = ΠX (Xt γt Vt 1/2) Xt+1 = Xt+1/2 + γt Vt 1/2 γt Vt+1/2 (OG)
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets No Figure 1: Illustration of the performance of EG and 1-EG in the (a priori non-monotone) saddle-point problem L(θ, φ) = 2ϵ1θ A1θ + ϵ2 θ A2θ 2 2ϵ1φ B1φ ϵ2 φ B2φ 2 + 4θ Cφ on the full unconstrained space X = Rd = Rd1 d2 with d1 = d2 = 1000 and A1, B1, A2, B2 0. The experiment uses a synthetically defined problem/function, not a named publicly available dataset.
Dataset Splits No The experiments are based on a synthetically defined saddle-point problem, L(θ, φ), rather than a dataset that requires explicit training, validation, or test splits.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud resources) used for conducting the experiments.
Software Dependencies No The paper does not provide details on specific software dependencies or their version numbers used in the experiments.
Experiment Setup Yes Figure 1: Illustration of the performance of EG and 1-EG in the (a priori non-monotone) saddle-point problem L(θ, φ) = 2ϵ1θ A1θ + ϵ2 θ A2θ 2 2ϵ1φ B1φ ϵ2 φ B2φ 2 + 4θ Cφ on the full unconstrained space X = Rd = Rd1 d2 with d1 = d2 = 1000 and A1, B1, A2, B2 0. We choose three situations representative of the settings considered in the paper: (a) linear convergence of the last iterate of the deterministic methods in strongly monotone problems; (b) the O(1/t) convergence of the ergodic average in monotone, deterministic problems; and (c) the O(1/t) local convergence rate of the method s last iterate in stochastic, non-monotone problems. For (a) and (b), the origin is the unique solution of (VI), and for (c) it is a regular solution thereof. We observe that 1-EG consistently outperforms EG in terms of oracle calls for a fixed step-size, and the observed rates are consistent with the rates reported in Table 1. (EG 1-EG γ = 0.1 γ = 0.2 γ = 0.3 γ = 0.4), (EG 1-EG γ = 0.2 γ = 0.4 γ = 0.6 γ = 0.8), (EG 1-EG γ = 0.3 γ = 0.6 γ = 0.9 γ = 1.2), (b = 15)