Adaptive Extra-Gradient Methods for Min-Max Optimization and Games

Authors: Kimon Antonakopoulos, Veronica Belmega, Panayotis Mertikopoulos

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude in this section with a numerical illustration of the convergence properties of Ada Prox in two different settings: a) bilinear min-max games; and b) a simple Wasserstein GAN in the spirit of Daskalakis et al. [12] with the aim of learning an unknown covariance matrix. ... Figure 2: Numerical comparison between the extra-gradient (EG), Bach Levy (BL) and Ada Prox algorithms (red circles, green squares and blue triangles respectively).
Researcher Affiliation Collaboration Kimon Antonakopoulos Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP LIG, 38000 Grenoble, France kimon.antonakopoulos@inria.fr E. Veronica Belmega ETIS/ENSEA Univ. de Cergy-Pontoise-CNRS, France belmega@ensea.fr Panayotis Mertikopoulos Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG, 38000 Grenoble, France & Criteo AI Lab panayotis.mertikopoulos@imag.fr
Pseudocode Yes Xt+1/2 = PXt( γt Vt) δt = Vt+1/2 Vt Xt+1/2, Xt+1 = PXt( γt Vt+1/2) γt+1 = 1 . q 1 + Pt s=1 δ2 s (Ada Prox)
Open Source Code No The paper does not provide information about open-source code for the described methodology.
Open Datasets No The paper describes generating synthetic data for its experiments (e.g., A drawn i.i.d. from a standard Gaussian) and defines a problem setup (e.g., Wasserstein GAN formulation) rather than using named publicly available datasets with access information.
Dataset Splits No The paper does not specify explicit training, validation, or test dataset splits.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes For the experiments, we took d = 100, a mini-batch of m = 128 samples per update... The step-size parameter of the EG algorithm was chosen as γt = 0.025/√t, whereas the BL algorithm was run with diameter and gradient bound estimation parameters D0 = .5 and M0 = 2.5 respectively.