Compositional Transformers for Scene Generation

Authors: Dor Arad Hudson, Larry Zitnick

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate GANformer2 s strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency.
Researcher Affiliation Collaboration Drew A. Hudson Department of Computer Science Stanford University dorarad@cs.stanford.edu C. Lawrence Zitnick Facebook AI Research Facebook, Inc. zitnick@fb.com
Pseudocode No No pseudocode or algorithm blocks found.
Open Source Code Yes See https://github.com/dorarad/gansformer for model implementation.
Open Datasets Yes We demonstrate GANformer2 s strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency.
Dataset Splits No We explore learning from two training sets: of images and layouts. Specifically, we use panoptic segmentations [43], which specify for every segment si its instance identity pi and semantic category mi, but other types of segmentations can likewise be used.
Hardware Specification Yes All models have been trained with resolution of 256 256 and for an equal number of training steps, roughly spanning 10 days on a single V100 GPU per model.
Software Dependencies No We implement the unconditional methods within our public GANformer codebase and use the authors official implementations for the conditional ones.
Experiment Setup Yes All models have been trained with resolution of 256 256 and for an equal number of training steps, roughly spanning 10 days on a single V100 GPU per model. See section H for description of baselines and competing approaches, implementation details, hyperparameter settings, data preparations and training configuration.