HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation
Authors: Bocheng , Yuhang Ma, wuliebucha , Shanyuan Liu, Ao Ma, Xiaoyu Wu, Dawei Leng, Yuhui Yin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the performance of multi-objective controllable layout generation in natural scenes, we introduce the Hi Co-7K benchmark, derived from the GRIT-20M dataset and manually cleaned. [...] We conducted experiments on both the closed-set COCO dataset and the open-ended GRIT dataset, and achieved excellent performance on both. [...] Our method achieves state-of-the-art performance on both the open-ended Hi Co-7K dataset and the closed-set COCO-3K[18] dataset. |
| Researcher Affiliation | Industry | 360 AI Research {chengbo1, mayuhang, wuliebucha, liushanyuan}@360.cn {maao, wuxiaoyu1, lengdawei, yinyuhui}@360.cn |
| Pseudocode | No | The paper describes the architecture and processes in text and diagrams but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper currently does not provide open-source code. However, we are actively working on it and plan to release it as soon as possible. |
| Open Datasets | Yes | To evaluate the performance of multi-objective controllable layout generation in natural scenes, we introduce the Hi Co-7K benchmark, derived from the GRIT-20M dataset and manually cleaned. https://github.com/360CVGroup/Hi Co_T2I. ... For training datasets, the fine-grained detailed description data, comprises 1.2 million image-text pairs with regions and descriptions sourced from GRIT-20M[15]. ... For training datasets, the coarse-grained categorical description data, we selecte a subset of approximately images from COCO Stuff[18] based on criteria such as region size, labeled as COCO-75K. |
| Dataset Splits | No | The paper mentions training and evaluation datasets (COCO-75K, GRIT-20M for training; COCO-3K, Hi Co-7K for evaluation) but does not provide specific train/validation/test dataset splits with percentages, counts, or explicit standard split citations for reproduction. |
| Hardware Specification | Yes | We train Hi Co with 8 A100 GPUs for 3 days. ... Specifically, we evaluated the inference time and GPU memory usage for directly generating 512 512 resolution images on the Hi Co-7K using a 24GB VRAM 3090 GPU |
| Software Dependencies | No | The paper mentions various software components and models like Stable Diffusion, GLIDE, Control Net, IP-Adapter, Lo RA, LCM, SDXL-Lightning, Grounding-DINO, CLIP, YOLOv4, and GPT-4, but it does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | Specifically, for SD1.5, We utilize the Adam W optimizer with a fixed learning rate of 1e-5 and train the model for 50,000 iterations with a batch size of 256. |