Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Compositional Transformers for Scene Generation
Authors: Dor Arad Hudson, Larry Zitnick
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate GANformer2 s strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency. |
| Researcher Affiliation | Collaboration | Drew A. Hudson Department of Computer Science Stanford University EMAIL C. Lawrence Zitnick Facebook AI Research Facebook, Inc. EMAIL |
| Pseudocode | No | No pseudocode or algorithm blocks found. |
| Open Source Code | Yes | See https://github.com/dorarad/gansformer for model implementation. |
| Open Datasets | Yes | We demonstrate GANformer2 s strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency. |
| Dataset Splits | No | We explore learning from two training sets: of images and layouts. Specifically, we use panoptic segmentations [43], which specify for every segment si its instance identity pi and semantic category mi, but other types of segmentations can likewise be used. |
| Hardware Specification | Yes | All models have been trained with resolution of 256 256 and for an equal number of training steps, roughly spanning 10 days on a single V100 GPU per model. |
| Software Dependencies | No | We implement the unconditional methods within our public GANformer codebase and use the authors official implementations for the conditional ones. |
| Experiment Setup | Yes | All models have been trained with resolution of 256 256 and for an equal number of training steps, roughly spanning 10 days on a single V100 GPU per model. See section H for description of baselines and competing approaches, implementation details, hyperparameter settings, data preparations and training configuration. |