On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach
Authors: Yuanhao Wang*, Guodong Zhang*, Jimmy Ba
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, FR solves toy minimax problems and improves the convergence of GAN training compared to the recent minimax optimization algorithms. |
| Researcher Affiliation | Academia | Yuanhao Wang 1, Guodong Zhang 2,3, Jimmy Ba2,3 1IIIS, Tsinghua University, 2University of Toronto, 3Vector Institute |
| Pseudocode | Yes | Algorithm 1 Follow-the-Ridge (FR). Differences from gradient descent-ascent are shown in blue. |
| Open Source Code | Yes | 1Our code is made public at: https://github.com/gd-zhang/Follow-the-Ridge |
| Open Datasets | Yes | We use the standard MNIST dataset (Le Cun et al., 1998) |
| Dataset Splits | No | No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and testing was explicitly provided. For MNIST, it states “For each class, we take 4,800 training examples. Overall, we have 9,800 examples.” but does not detail how the remaining data might be split for validation or testing. |
| Hardware Specification | No | No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | No specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment were provided. The paper mentions “RMSprop” and “conjugate gradient” as methods, but not software implementations with version numbers. |
| Experiment Setup | Yes | To satisfy the non-singular Hessian assumption, we add L2 regularization (0.0002) to the discriminator. For both generator and discriminator, we use 2-hidden-layers MLP with 64 hidden units each layer where tanh activations is used. By default, RMSprop (Tieleman and Hinton, 2012) is used in all our experiments while the learning rate is tuned for GDA... For both generator and discriminator, we use learning rate 0.0002. In terms of network architectures, we use 2-hidden-layers MLP with 512 hidden units in each layer for both the discriminator and the generator. For the discriminator, we use Sigmoid activation in the output layer. We use RMSProp as our base optimizer in the experiments with batch size 2,000. We run both GDA and FR for 100,000 iterations. |