R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation

Authors: Jiayu Xiao, Henglei Lv, Liang Li, Shuhui Wang, Qingming Huang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our method outperforms existing state-of-the-art zero-shot grounded T2I generation methods by a large margin both qualitatively and quantitatively on several benchmarks.
Researcher Affiliation Academia 1Key Lab of Intelligent Information Processing, ICT, CAS, Beijing, China 2 University of Chinese Academy of Sciences, Beijing, China 3 Peng Cheng Laboratory, Shenzhen, China
Pseudocode No The paper describes the steps of the method in paragraph text and refers to figures, but does not provide a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Project page: https://sagileo.github.io/Region-and-Boundary.
Open Datasets Yes We use two benchmarks: HRS (Bakr et al., 2023) and Drawbench (Saharia et al., 2022). ... We select 100 samples from the MS-COCO (Lin et al., 2014) dataset and create triplets consisting of image caption, object phrases and bounding boxes.
Dataset Splits No The paper specifies the datasets used for evaluation but does not provide explicit training/validation/test splits, percentages, or sample counts for these datasets as defined for their experimental setup.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments. It mentions hardware in the context of a comparative method (GLIGEN) but not for their own R&B method.
Software Dependencies No The paper mentions "Stable Diffusion V-1.5" as the base model and "DDIM scheduler," but it does not list specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions, or other libraries).
Experiment Setup Yes We adopt the DDIM scheduler (Song et al., 2020a) with 50 denoising steps. The ratio of classifier-free guidance is set as 7.5. ... we only perform layout guidance at the first 10 steps. The λ for dynamic threshold in Eq. (6) is set as 0.4, the ratios λs and λa in Eq. (11) are set as 1.5 and 1.0 respectively.