ViTGAN: Training GANs with Vision Transformers
Authors: Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our approach, named Vi TGAN, achieves comparable performance to the leading CNNbased GAN models on three datasets: CIFAR-10, Celeb A, and LSUN bedroom. |
| Researcher Affiliation | Collaboration | Kwonjoon Lee1,3 Huiwen Chang2 Lu Jiang2 Han Zhang2 Zhuowen Tu1 Ce Liu4 1UC San Diego 2Google Research 3Honda Research Institute 4Microsoft Azure AI kwl042@eng.ucsd.edu {huiwenchang,lujiang,zhanghan}@google.com ztu@ucsd.edu ce.liu@microsoft.com |
| Pseudocode | No | The paper includes mathematical equations and architectural diagrams (e.g., Figure 1, Figure 2) but no explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Empirically, our approach, named Vi TGAN, achieves comparable performance to the leading CNNbased GAN models on three datasets: CIFAR-10, Celeb A, and LSUN bedroom. Our code is available online1. 1https://github.com/mlpc-ucsd/Vi TGAN |
| Open Datasets | Yes | We train and evaluate our model on various standard benchmarks for image generation, including CIFAR-10 (Krizhevsky et al., 2009), LSUN bedroom (Yu et al., 2015) and Celeb A (Liu et al., 2015). |
| Dataset Splits | Yes | The LSUN bedroom dataset (Yu et al., 2015) is a large-scale image generation benchmark, consisting of 3 million training images and 300 images for validation. On this dataset, FID is computed against the training set due to the small validation set. |
| Hardware Specification | Yes | Both Vi TGAN and Style GAN2 are based on Tensorflow 2 implementation2 and trained on Google Cloud TPU v2-32 and v3-8. |
| Software Dependencies | Yes | Both Vi TGAN and Style GAN2 are based on Tensorflow 2 implementation2 and trained on Google Cloud TPU v2-32 and v3-8. |
| Experiment Setup | Yes | We train our models with Adam with β1 = 0.0, β2 = 0.99, and a learning rate of 0.002 following the practice of (Karras et al., 2020b). In addition, we employ non-saturating logistic loss (Goodfellow et al., 2014), exponential moving average of generator weights (Karras et al., 2018), and equalized learning rate (Karras et al., 2018). We use a mini-batch size of 128. |