R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation
Authors: Jiayu Xiao, Henglei Lv, Liang Li, Shuhui Wang, Qingming Huang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our method outperforms existing state-of-the-art zero-shot grounded T2I generation methods by a large margin both qualitatively and quantitatively on several benchmarks. |
| Researcher Affiliation | Academia | 1Key Lab of Intelligent Information Processing, ICT, CAS, Beijing, China 2 University of Chinese Academy of Sciences, Beijing, China 3 Peng Cheng Laboratory, Shenzhen, China |
| Pseudocode | No | The paper describes the steps of the method in paragraph text and refers to figures, but does not provide a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Project page: https://sagileo.github.io/Region-and-Boundary. |
| Open Datasets | Yes | We use two benchmarks: HRS (Bakr et al., 2023) and Drawbench (Saharia et al., 2022). ... We select 100 samples from the MS-COCO (Lin et al., 2014) dataset and create triplets consisting of image caption, object phrases and bounding boxes. |
| Dataset Splits | No | The paper specifies the datasets used for evaluation but does not provide explicit training/validation/test splits, percentages, or sample counts for these datasets as defined for their experimental setup. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments. It mentions hardware in the context of a comparative method (GLIGEN) but not for their own R&B method. |
| Software Dependencies | No | The paper mentions "Stable Diffusion V-1.5" as the base model and "DDIM scheduler," but it does not list specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions, or other libraries). |
| Experiment Setup | Yes | We adopt the DDIM scheduler (Song et al., 2020a) with 50 denoising steps. The ratio of classifier-free guidance is set as 7.5. ... we only perform layout guidance at the first 10 steps. The λ for dynamic threshold in Eq. (6) is set as 0.4, the ratios λs and λa in Eq. (11) are set as 1.5 and 1.0 respectively. |