Compositional Transformers for Scene Generation
Authors: Dor Arad Hudson, Larry Zitnick
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate GANformer2 s strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency. |
| Researcher Affiliation | Collaboration | Drew A. Hudson Department of Computer Science Stanford University dorarad@cs.stanford.edu C. Lawrence Zitnick Facebook AI Research Facebook, Inc. zitnick@fb.com |
| Pseudocode | No | No pseudocode or algorithm blocks found. |
| Open Source Code | Yes | See https://github.com/dorarad/gansformer for model implementation. |
| Open Datasets | Yes | We demonstrate GANformer2 s strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency. |
| Dataset Splits | No | We explore learning from two training sets: of images and layouts. Specifically, we use panoptic segmentations [43], which specify for every segment si its instance identity pi and semantic category mi, but other types of segmentations can likewise be used. |
| Hardware Specification | Yes | All models have been trained with resolution of 256 256 and for an equal number of training steps, roughly spanning 10 days on a single V100 GPU per model. |
| Software Dependencies | No | We implement the unconditional methods within our public GANformer codebase and use the authors official implementations for the conditional ones. |
| Experiment Setup | Yes | All models have been trained with resolution of 256 256 and for an equal number of training steps, roughly spanning 10 days on a single V100 GPU per model. See section H for description of baselines and competing approaches, implementation details, hyperparameter settings, data preparations and training configuration. |