ViTGAN: Training GANs with Vision Transformers

Authors: Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our approach, named Vi TGAN, achieves comparable performance to the leading CNNbased GAN models on three datasets: CIFAR-10, Celeb A, and LSUN bedroom.
Researcher Affiliation Collaboration Kwonjoon Lee1,3 Huiwen Chang2 Lu Jiang2 Han Zhang2 Zhuowen Tu1 Ce Liu4 1UC San Diego 2Google Research 3Honda Research Institute 4Microsoft Azure AI kwl042@eng.ucsd.edu {huiwenchang,lujiang,zhanghan}@google.com ztu@ucsd.edu ce.liu@microsoft.com
Pseudocode No The paper includes mathematical equations and architectural diagrams (e.g., Figure 1, Figure 2) but no explicit pseudocode or algorithm blocks.
Open Source Code Yes Empirically, our approach, named Vi TGAN, achieves comparable performance to the leading CNNbased GAN models on three datasets: CIFAR-10, Celeb A, and LSUN bedroom. Our code is available online1. 1https://github.com/mlpc-ucsd/Vi TGAN
Open Datasets Yes We train and evaluate our model on various standard benchmarks for image generation, including CIFAR-10 (Krizhevsky et al., 2009), LSUN bedroom (Yu et al., 2015) and Celeb A (Liu et al., 2015).
Dataset Splits Yes The LSUN bedroom dataset (Yu et al., 2015) is a large-scale image generation benchmark, consisting of 3 million training images and 300 images for validation. On this dataset, FID is computed against the training set due to the small validation set.
Hardware Specification Yes Both Vi TGAN and Style GAN2 are based on Tensorflow 2 implementation2 and trained on Google Cloud TPU v2-32 and v3-8.
Software Dependencies Yes Both Vi TGAN and Style GAN2 are based on Tensorflow 2 implementation2 and trained on Google Cloud TPU v2-32 and v3-8.
Experiment Setup Yes We train our models with Adam with β1 = 0.0, β2 = 0.99, and a learning rate of 0.002 following the practice of (Karras et al., 2020b). In addition, we employ non-saturating logistic loss (Goodfellow et al., 2014), exponential moving average of generator weights (Karras et al., 2018), and equalized learning rate (Karras et al., 2018). We use a mini-batch size of 128.