LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Authors: Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, S Basu, Xin Eric Wang, William Yang Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide an extensive evaluation of Layout GPT for 2D text-to-image (T2I) synthesis and compare it with SOTA T2I models/systems. An ablation study is conducted to demonstrate the effect of individual components from Layout GPT. |
| Researcher Affiliation | Collaboration | Weixi Feng1 Wanrong Zhu1 Tsu-jui Fu1 Varun Jampani2 Arjun Akula2 Xuehai He3 Sugato Basu2 Xin Eric Wang3 William Yang Wang1 1University of California, Santa Barbara 2Google 3University of California, Santa Cruz |
| Pseudocode | No | The paper refers to pseudocode in the related work section but does not contain structured pseudocode or algorithm blocks for its own method. |
| Open Source Code | Yes | https://github.com/weixi-feng/Layout GPT |
| Open Datasets | Yes | To evaluate the generations in terms of specified counts and spatial locations, we propose NSR-1K, a benchmark that includes template-based and human-written (natural) prompts from MSCOCO [29]. Table 1 summarizes our dataset statistics with examples. For indoor scene synthesis, we use an updated version of the 3D-FRONT dataset [12, 13] following ATISS [38]. |
| Dataset Splits | Yes | Table 1 summarizes our dataset statistics with examples. ... we end up with 3397/453/423 for train/val/test split of bedroom scenes and 690/98/53 for train/val/test split of living room scenes. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions specific LLM models (e.g., "GPT-3.5/4", "Codex") and other foundational models (e.g., "GLIGEN", "Stable Diffusion") but does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, TensorFlow versions) used to replicate the experiment. |
| Experiment Setup | Yes | For all LLMs, we fix the sampling temperature to 0.7 and apply no penalty to the next token prediction. For image layouts evaluation in Table 2, we fix the number of exemplars to 16 for numerical reasoning, and 8 for spatial reasoning, based on the best results of a preliminary experiment. However, we do not observe significant gaps in evaluation results when using different amounts of exemplars (see Sec. B.4). For each prompt, we generate five different layouts/images using baselines or Layout GPT and thus result in 3810 images for numerical reasoning and 1415 images for spatial reasoning in all reported evaluation results. As for indoor scene synthesis, we fix the number of exemplars to 8 for bedrooms and 4 for living rooms to reach the maximum allowed input tokens. We set the maximum output token as 512 for bedrooms and 1024 for living rooms as bedrooms have 5 objects per room while living rooms have 11 objects per room. We generate one layout for each rectangular floor plan for evaluation. |