reproducibilityindex.ai

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Authors: Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, S Basu, Xin Eric Wang, William Yang Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide an extensive evaluation of Layout GPT for 2D text-to-image (T2I) synthesis and compare it with SOTA T2I models/systems. An ablation study is conducted to demonstrate the effect of individual components from Layout GPT.
Researcher Affiliation	Collaboration	Weixi Feng1 Wanrong Zhu1 Tsu-jui Fu1 Varun Jampani2 Arjun Akula2 Xuehai He3 Sugato Basu2 Xin Eric Wang3 William Yang Wang1 1University of California, Santa Barbara 2Google 3University of California, Santa Cruz
Pseudocode	No	The paper refers to pseudocode in the related work section but does not contain structured pseudocode or algorithm blocks for its own method.
Open Source Code	Yes	https://github.com/weixi-feng/Layout GPT
Open Datasets	Yes	To evaluate the generations in terms of specified counts and spatial locations, we propose NSR-1K, a benchmark that includes template-based and human-written (natural) prompts from MSCOCO [29]. Table 1 summarizes our dataset statistics with examples. For indoor scene synthesis, we use an updated version of the 3D-FRONT dataset [12, 13] following ATISS [38].
Dataset Splits	Yes	Table 1 summarizes our dataset statistics with examples. ... we end up with 3397/453/423 for train/val/test split of bedroom scenes and 690/98/53 for train/val/test split of living room scenes.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions specific LLM models (e.g., "GPT-3.5/4", "Codex") and other foundational models (e.g., "GLIGEN", "Stable Diffusion") but does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, TensorFlow versions) used to replicate the experiment.
Experiment Setup	Yes	For all LLMs, we fix the sampling temperature to 0.7 and apply no penalty to the next token prediction. For image layouts evaluation in Table 2, we fix the number of exemplars to 16 for numerical reasoning, and 8 for spatial reasoning, based on the best results of a preliminary experiment. However, we do not observe significant gaps in evaluation results when using different amounts of exemplars (see Sec. B.4). For each prompt, we generate five different layouts/images using baselines or Layout GPT and thus result in 3810 images for numerical reasoning and 1415 images for spatial reasoning in all reported evaluation results. As for indoor scene synthesis, we fix the number of exemplars to 8 for bedrooms and 4 for living rooms to reach the maximum allowed input tokens. We set the maximum output token as 512 for bedrooms and 1024 for living rooms as bedrooms have 5 objects per room while living rooms have 11 objects per room. We generate one layout for each rectangular floor plan for evaluation.