K-Beam Minimax: Efficient Optimization for Deep Adversarial Learning

Authors: Jihun Hamm, Yung-Kyun Noh

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the advantages of the algorithm, we test the algorithm on the toy surfaces (Fig. 1) for which we know the true minimax solutions. For real-world demonstrations, we also test the algorithm on GAN problems (Goodfellow et al., 2014), and unsupervised domain-adaptation problems (Ganin & Lempitsky, 2015). Examples were chosen so that the performance can be measured objectively by the Jensen-Shannon divergence for GAN and by cross-domain classification error for domain adaptation. Evaluations show that the proposed K-beam subgradient-descent approach can significantly improve stability and convergence speed of minimax optimization.
Researcher Affiliation Academia 1The Ohio State University, Columbus, OH, USA. 2Seoul National University, Seoul, Korea.
Pseudocode Yes Algorithm 1 K-beam ϵ-subgradient descent
Open Source Code Yes The codes for the project can be found at https://github.com/ jihunhamm/k-beam-minimax.
Open Datasets Yes We train GANs with the proposed algorithm to learn a generative model of two-dimensional mixtures of Gaussians (Mo Gs). Let x be a sample from the Mo G with the density p(x) = 1/7 P6 i=0 N (sin(πi/4), cos(πi/4)), (0.01)2I2, and z be a sample from the 256-dimensional Gaussian distribution N(0, I256).
Dataset Splits No The paper describes the datasets used (Mo Gs, MNIST/MNIST-M) but does not provide specific train/validation/test dataset splits by percentage, count, or a reference to a standard split definition.
Hardware Specification Yes We measure the runtime of the algorithms by wall clock on the same system using a single NVIDIA GTX980 4GB GPU with a single Intel Core i7-2600 CPU.
Software Dependencies No The paper mentions optimizers used (Adam optimizer, momentum optimizer) but does not provide specific software library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes Both G and D are two-layer tanh networks with 128 hidden units per layer, trained with Adam optimizer with batch size 128 and the learning rate of 10-4 for the discriminator and 10-3 for the generator.