Training GANs with Optimism

Authors: Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, Haoyang Zeng

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply OMD WGAN training to a bioinformatics problem of generating DNA sequences. We observe that models trained with OMD achieve consistently smaller KL divergence with respect to the true underlying distribution, than models trained with GD variants. Finally, we introduce a new algorithm, Optimistic Adam, which is an optimistic variant of Adam. We apply it to WGAN training on CIFAR10 and observe improved performance in terms of inception score as compared to Adam.
Researcher Affiliation Collaboration Constantinos Daskalakis MIT, EECS costis@mit.edu Andrew Ilyas MIT, EECS ailyas@mit.edu Vasilis Syrgkanis Microsoft Research vasy@microsoft.com Haoyang Zeng MIT, EECS haoyangz@mit.edu
Pseudocode Yes Algorithm 1 Optimistic ADAM, proposed algorithm for training WGANs on images.
Open Source Code Yes Code for our models is available at https://github.com/vsyrgkanis/optimistic_GAN_ training
Open Datasets Yes We apply optimism to training GANs for images and introduce the Optimistic Adam algorithm. We show that it achieves better performance than Adam, in terms of inception score, when trained on CIFAR10.
Dataset Splits Yes A random 10% of the sequences were held out as the validation set.
Hardware Specification No The paper does not provide specific details on the hardware used for experiments, such as GPU/CPU models, memory, or specific computing environments.
Software Dependencies No The paper mentions software like Adam and refers to common libraries implicitly through GAN architectures, but it does not specify version numbers for any software dependencies.
Experiment Setup Yes The same learning rate 0.0001 and betas (β1 = 0.5, β2 = 0.9) as in Appendix B of Gulrajani et al. (2017) were used for all the methods compared. We also matched other hyper-parameters such as gradient penalty coefficient λ and batch size.