HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation

Authors: Bocheng , Yuhang Ma, wuliebucha , Shanyuan Liu, Ao Ma, Xiaoyu Wu, Dawei Leng, Yuhui Yin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the performance of multi-objective controllable layout generation in natural scenes, we introduce the Hi Co-7K benchmark, derived from the GRIT-20M dataset and manually cleaned. [...] We conducted experiments on both the closed-set COCO dataset and the open-ended GRIT dataset, and achieved excellent performance on both. [...] Our method achieves state-of-the-art performance on both the open-ended Hi Co-7K dataset and the closed-set COCO-3K[18] dataset.
Researcher Affiliation Industry 360 AI Research {chengbo1, mayuhang, wuliebucha, liushanyuan}@360.cn {maao, wuxiaoyu1, lengdawei, yinyuhui}@360.cn
Pseudocode No The paper describes the architecture and processes in text and diagrams but does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper currently does not provide open-source code. However, we are actively working on it and plan to release it as soon as possible.
Open Datasets Yes To evaluate the performance of multi-objective controllable layout generation in natural scenes, we introduce the Hi Co-7K benchmark, derived from the GRIT-20M dataset and manually cleaned. https://github.com/360CVGroup/Hi Co_T2I. ... For training datasets, the fine-grained detailed description data, comprises 1.2 million image-text pairs with regions and descriptions sourced from GRIT-20M[15]. ... For training datasets, the coarse-grained categorical description data, we selecte a subset of approximately images from COCO Stuff[18] based on criteria such as region size, labeled as COCO-75K.
Dataset Splits No The paper mentions training and evaluation datasets (COCO-75K, GRIT-20M for training; COCO-3K, Hi Co-7K for evaluation) but does not provide specific train/validation/test dataset splits with percentages, counts, or explicit standard split citations for reproduction.
Hardware Specification Yes We train Hi Co with 8 A100 GPUs for 3 days. ... Specifically, we evaluated the inference time and GPU memory usage for directly generating 512 512 resolution images on the Hi Co-7K using a 24GB VRAM 3090 GPU
Software Dependencies No The paper mentions various software components and models like Stable Diffusion, GLIDE, Control Net, IP-Adapter, Lo RA, LCM, SDXL-Lightning, Grounding-DINO, CLIP, YOLOv4, and GPT-4, but it does not specify their version numbers for reproducibility.
Experiment Setup Yes Specifically, for SD1.5, We utilize the Adam W optimizer with a fixed learning rate of 1e-5 and train the model for 50,000 iterations with a batch size of 256.