Convergence of Gradient Methods on Bilinear Zero-Sum Games
Authors: Guojun Zhang, Yaoliang Yu
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS Bilinear game We run experiments on a simple bilinear game and choose the optimal parameters as suggested in Theorem 4.1 and 4.2. The results are shown in the left panel of Figure 1, which confirms the predicted linear rates. Density plots We show the density plots (heat maps) of the spectral radii in Figure 2. We make plots for EG, OGD and momentum with both Jacobi and GS updates. These plots are made when β1 = β2 = β and they agree with our theorems in 3. Wasserstein GAN As in Daskalakis et al. (2018), we consider a WGAN (Arjovsky et al., 2017) that learns the mean of a Gaussian:... Mixtures of Gaussians (GMMs) Our last experiment is on learning GMMs with a vanilla GAN (Goodfellow et al., 2014) that does not directly fall into our analysis. We choose a 3-hidden layer Re LU network for both the generator and the discriminator, and each hidden layer has 256 units. |
| Researcher Affiliation | Academia | Guojun Zhang & Yaoliang Yu Department of Computer Science University of Waterloo Vector Institute {guojun.zhang,yaoliang.yu}@uwaterloo.ca |
| Pseudocode | No | The paper describes algorithms using equations and textual descriptions but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper mentions using a Wasserstein GAN (WGAN) and training on Mixtures of Gaussians (GMMs) with a vanilla GAN, citing 'Arjovsky et al., 2017' and 'Goodfellow et al., 2014'. However, no specific access information (URL, DOI, or explicit repository name) for a public dataset is provided. The WGAN setup describes learning a mean of a Gaussian, implying synthetic or custom data, and GMMs are target distributions rather than a named public dataset with specific access instructions. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit splitting methodologies). |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper mentions 'Mathematica code' in the appendices but does not provide specific version numbers for this or any other software dependencies, libraries, or solvers used for the experiments. |
| Experiment Setup | Yes | We run experiments on a simple bilinear game and choose the optimal parameters as suggested in Theorem 4.1 and 4.2... Inspired by Theorem 4.1, we compare the convergence of two EGs with the same parameter β = αγ, and find that with scaling, EG has better convergence, as shown in the right panel of Figure 1... In Figure 3, we can see that GS updates converge even if the corresponding Jacobi updates do not. For EG with γ = 0.2, α = 0.02; OGD with α = 0.2, β1 = 0.1, β2 = 0; Momentum with α = 0.08, β = 0.1... We choose a 3-hidden layer ReLU network for both the generator and the discriminator, and each hidden layer has 256 units... stochastic GD (step size α = 0.01)... stochastic OGD (α = 2β = 0.02)... Adam, with the step size α = 0.0002, and β1 = 0.9, β2 = 0.999. |