K-Beam Minimax: Efficient Optimization for Deep Adversarial Learning
Authors: Jihun Hamm, Yung-Kyun Noh
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the advantages of the algorithm, we test the algorithm on the toy surfaces (Fig. 1) for which we know the true minimax solutions. For real-world demonstrations, we also test the algorithm on GAN problems (Goodfellow et al., 2014), and unsupervised domain-adaptation problems (Ganin & Lempitsky, 2015). Examples were chosen so that the performance can be measured objectively by the Jensen-Shannon divergence for GAN and by cross-domain classification error for domain adaptation. Evaluations show that the proposed K-beam subgradient-descent approach can significantly improve stability and convergence speed of minimax optimization. |
| Researcher Affiliation | Academia | 1The Ohio State University, Columbus, OH, USA. 2Seoul National University, Seoul, Korea. |
| Pseudocode | Yes | Algorithm 1 K-beam ϵ-subgradient descent |
| Open Source Code | Yes | The codes for the project can be found at https://github.com/ jihunhamm/k-beam-minimax. |
| Open Datasets | Yes | We train GANs with the proposed algorithm to learn a generative model of two-dimensional mixtures of Gaussians (Mo Gs). Let x be a sample from the Mo G with the density p(x) = 1/7 P6 i=0 N (sin(πi/4), cos(πi/4)), (0.01)2I2, and z be a sample from the 256-dimensional Gaussian distribution N(0, I256). |
| Dataset Splits | No | The paper describes the datasets used (Mo Gs, MNIST/MNIST-M) but does not provide specific train/validation/test dataset splits by percentage, count, or a reference to a standard split definition. |
| Hardware Specification | Yes | We measure the runtime of the algorithms by wall clock on the same system using a single NVIDIA GTX980 4GB GPU with a single Intel Core i7-2600 CPU. |
| Software Dependencies | No | The paper mentions optimizers used (Adam optimizer, momentum optimizer) but does not provide specific software library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | Both G and D are two-layer tanh networks with 128 hidden units per layer, trained with Adam optimizer with batch size 128 and the learning rate of 10-4 for the discriminator and 10-3 for the generator. |