reproducibilityindex.ai

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

Authors: Hanan Gani, Shariq Farooq Bhat, Muzammal Naseer, Salman Khan, Peter Wonka

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation on complex prompts featuring multiple objects demonstrates a substantial improvement in recall compared to baseline diffusion models. This is further validated by a user study, underscoring the efficacy of our approach in generating coherent and detailed scenes from intricate textual inputs.Our results showcase a significant improvement in recall ( 85%) compared to the baseline Feng et al. (2023) ( 69%) (+16 % improvement). We also include a user study that demonstrates that our proposed method consistently produces coherent images that closely align with their respective textual descriptions, whereas existing approaches struggle to effectively handle longer text prompts
Researcher Affiliation	Academia	Hanan Gani1, Shariq Farooq Bhat2, Muzammal Naseer1, Salman Khan1,3, Peter Wonka2 1Mohamed Bin Zayed University of AI 2KAUST 3Australian National University {hanan.ghani, muzammal.naseer, salman.khan}@mbzuai.ac.ae shariq.bhat@kaust.edu.sa, pwonka@gmail.com
Pseudocode	Yes	We provide a pseudo code of our algorithm in Algorithm 1.Algorithm 1 LLM Blueprint
Open Source Code	Yes	Our code is available at https://github.com/hananshafi/llmblueprint.
Open Datasets	Yes	For acquiring the long text descriptions, we ask Chat GPT to generate scenes on various themes. In addition to this, we also use the textual descriptions from some COCO (Lin et al., 2014) and PASCAL (Everingham et al., 2010) images by querying an image captioning model (Zhu et al., 2023) to generate a detailed description spanning 80-100 words.
Dataset Splits	No	The paper uses datasets like COCO and PASCAL but does not provide specific numerical training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	Yes	Finally, our entire pipeline runs on a single Nvidia A100 40GB GPU.
Software Dependencies	Yes	For implementation, we use Pytorch 2.0.
Experiment Setup	Yes	We use 20 diffusion steps at this point. For box refinement, we use the pre-trained image composition model of Yang et al. (2023a) which conditions on a reference image. For each box refinement, we use 50 diffusion steps.Input: Long textual description C, diffusion steps k, sampling iterations n