Improved Transformer for High-Resolution GANs

Authors: Long Zhao, Zizhao Zhang, Ting Chen, Dimitris Metaxas, Han Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show in the experiments that the proposed Hi T achieves state-of-the-art FID scores of 30.83 and 2.95 on unconditional Image Net 128 128 and FFHQ 256 256, respectively, with a reasonable throughput. We validate the proposed method on three datasets: Image Net [49], Celeb A-HQ [25], and FFHQ [28]. We also adopt Image Net as the main test bed during the ablation study.
Researcher Affiliation Collaboration Long Zhao1, Zizhao Zhang2 Ting Chen3 Dimitris N. Metaxas1 Han Zhang3 1Rutgers University 2Google Cloud AI 3Google Research
Pseudocode No The detailed algorithm can be found in the supplementary materials. This indicates pseudocode or algorithm blocks are not present in the main paper.
Open Source Code Yes Our code is made publicly available at https://github.com/google-research/hit-gan.
Open Datasets Yes We validate the proposed method on three datasets: Image Net [49], Celeb A-HQ [25], and FFHQ [28].
Dataset Splits No The paper mentions random crop for training and center crop for testing, and using specific numbers of images for testing, but does not explicitly provide details about a validation split for model training.
Hardware Specification Yes All the models are trained using TPU for one million iterations on Image Net and 500,000 iterations on FFHQ and Celeb A-HQ. The throughput is measured on a single Tesla V100 GPU.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes Our model is trained with a standard non-saturating logistic GAN loss with R1 gradient penalty [40] applied to the discriminator. R1 penalty penalizes the discriminator for deviating from the Nash-equilibrium by penalizing the gradient on real data alone. The gradient penalty weight is set to 10. Adam [29] is utilized for optimization with β1 = 0 and β2 = 0.99. The learning rate is 0.0001 for both the generator and discriminator. All the models are trained using TPU for one million iterations on Image Net and 500,000 iterations on FFHQ and Celeb A-HQ. We set the mini-batch size to 256 for the image resolution of 128 128 and 256 256 while to 32 for the resolution of 1024 1024.