Generative Adversarial Transformers

Authors: Drew A Hudson, Larry Zitnick

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the model s strength and robustness through a careful evaluation over a range of datasets, from simulated multi-object environments to rich real-world indoor and outdoor scenes, showing it achieves state-of-theart results in terms of image quality and diversity, while enjoying fast learning and better dataefficiency. Further qualitative and quantitative experiments offer us an insight into the model s inner workings, revealing improved interpretability and stronger disentanglement, and illustrating the benefits and efficacy of our approach. We investigate the GANsformer through a suite of experiments to study its quantitative performance and qualitative behavior.
Researcher Affiliation Collaboration Drew A. Hudson 1 C. Lawrence Zitnick 2 1Computer Science Department, Stanford University, CA, USA 2Facebook AI Research, CA, USA. Correspondence to: Drew A. Hudson <dorarad@cs.stanford.edu>.
Pseudocode No The paper describes the Simplex Attention and Duplex Attention using mathematical formulations and prose, but it does not include a clearly labeled pseudocode block or algorithm figure.
Open Source Code Yes An implementation of the model is available at https: //github.com/dorarad/gansformer.
Open Datasets Yes We investigate the GANsformer through a suite of experiments to study its quantitative performance and qualitative behavior. As we will see below, the GANsformer achieves state-of-the-art results, successfully producing high-quality images for a varied assortment of datasets: FFHQ for human faces (Karras et al., 2019), CLEVR for multi-object scenes (Johnson et al., 2017), and the LSUN-Bedroom (Yu et al., 2015) and Cityscapes (Cordts et al., 2016) datasets for challenging indoor and outdoor scenes.
Dataset Splits No The paper mentions training models with images of 256x256 resolution for a certain number of training steps, and evaluates them on various metrics, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages or exact counts) for reproducibility.
Hardware Specification Yes All models have been trained with images of 256 256 resolution and for the same number of training steps, roughly spanning a week on 2 NVIDIA V100 GPUs per model (or equivalently 3-4 days using 4 GPUs).
Software Dependencies No The paper states 'we implement them all within the codebase introduced by the Style GAN authors. The only exception to that is the recent VQGAN model for which we use the official implementation.' However, it does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup No The paper mentions adopting 'settings and techniques used in the Style GAN and Style GAN2 models (Karras et al., 2019; 2020), including in particular style mixing, stochastic variation, exponential moving average for weights, and a non-saturating logistic loss with lazy R1 regularization.' It also states 'All models have been trained with images of 256x256 resolution and for the same number of training steps.' However, it defers specific hyperparameter settings to 'supplementary material A for further implementation details, hyperparameter settings and training configuration,' indicating these details are not in the main text.