Generative Adversarial Transformers
Authors: Drew A Hudson, Larry Zitnick
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the model s strength and robustness through a careful evaluation over a range of datasets, from simulated multi-object environments to rich real-world indoor and outdoor scenes, showing it achieves state-of-theart results in terms of image quality and diversity, while enjoying fast learning and better dataefficiency. Further qualitative and quantitative experiments offer us an insight into the model s inner workings, revealing improved interpretability and stronger disentanglement, and illustrating the benefits and efficacy of our approach. We investigate the GANsformer through a suite of experiments to study its quantitative performance and qualitative behavior. |
| Researcher Affiliation | Collaboration | Drew A. Hudson 1 C. Lawrence Zitnick 2 1Computer Science Department, Stanford University, CA, USA 2Facebook AI Research, CA, USA. Correspondence to: Drew A. Hudson <dorarad@cs.stanford.edu>. |
| Pseudocode | No | The paper describes the Simplex Attention and Duplex Attention using mathematical formulations and prose, but it does not include a clearly labeled pseudocode block or algorithm figure. |
| Open Source Code | Yes | An implementation of the model is available at https: //github.com/dorarad/gansformer. |
| Open Datasets | Yes | We investigate the GANsformer through a suite of experiments to study its quantitative performance and qualitative behavior. As we will see below, the GANsformer achieves state-of-the-art results, successfully producing high-quality images for a varied assortment of datasets: FFHQ for human faces (Karras et al., 2019), CLEVR for multi-object scenes (Johnson et al., 2017), and the LSUN-Bedroom (Yu et al., 2015) and Cityscapes (Cordts et al., 2016) datasets for challenging indoor and outdoor scenes. |
| Dataset Splits | No | The paper mentions training models with images of 256x256 resolution for a certain number of training steps, and evaluates them on various metrics, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages or exact counts) for reproducibility. |
| Hardware Specification | Yes | All models have been trained with images of 256 256 resolution and for the same number of training steps, roughly spanning a week on 2 NVIDIA V100 GPUs per model (or equivalently 3-4 days using 4 GPUs). |
| Software Dependencies | No | The paper states 'we implement them all within the codebase introduced by the Style GAN authors. The only exception to that is the recent VQGAN model for which we use the official implementation.' However, it does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | No | The paper mentions adopting 'settings and techniques used in the Style GAN and Style GAN2 models (Karras et al., 2019; 2020), including in particular style mixing, stochastic variation, exponential moving average for weights, and a non-saturating logistic loss with lazy R1 regularization.' It also states 'All models have been trained with images of 256x256 resolution and for the same number of training steps.' However, it defers specific hyperparameter settings to 'supplementary material A for further implementation details, hyperparameter settings and training configuration,' indicating these details are not in the main text. |