reproducibilityindex.ai

HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation

Authors: Bocheng , Yuhang Ma, wuliebucha , Shanyuan Liu, Ao Ma, Xiaoyu Wu, Dawei Leng, Yuhui Yin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the performance of multi-objective controllable layout generation in natural scenes, we introduce the Hi Co-7K benchmark, derived from the GRIT-20M dataset and manually cleaned. [...] We conducted experiments on both the closed-set COCO dataset and the open-ended GRIT dataset, and achieved excellent performance on both. [...] Our method achieves state-of-the-art performance on both the open-ended Hi Co-7K dataset and the closed-set COCO-3K[18] dataset.
Researcher Affiliation	Industry	360 AI Research {chengbo1, mayuhang, wuliebucha, liushanyuan}@360.cn {maao, wuxiaoyu1, lengdawei, yinyuhui}@360.cn
Pseudocode	No	The paper describes the architecture and processes in text and diagrams but does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper currently does not provide open-source code. However, we are actively working on it and plan to release it as soon as possible.
Open Datasets	Yes	To evaluate the performance of multi-objective controllable layout generation in natural scenes, we introduce the Hi Co-7K benchmark, derived from the GRIT-20M dataset and manually cleaned. https://github.com/360CVGroup/Hi Co_T2I. ... For training datasets, the fine-grained detailed description data, comprises 1.2 million image-text pairs with regions and descriptions sourced from GRIT-20M[15]. ... For training datasets, the coarse-grained categorical description data, we selecte a subset of approximately images from COCO Stuff[18] based on criteria such as region size, labeled as COCO-75K.
Dataset Splits	No	The paper mentions training and evaluation datasets (COCO-75K, GRIT-20M for training; COCO-3K, Hi Co-7K for evaluation) but does not provide specific train/validation/test dataset splits with percentages, counts, or explicit standard split citations for reproduction.
Hardware Specification	Yes	We train Hi Co with 8 A100 GPUs for 3 days. ... Specifically, we evaluated the inference time and GPU memory usage for directly generating 512 512 resolution images on the Hi Co-7K using a 24GB VRAM 3090 GPU
Software Dependencies	No	The paper mentions various software components and models like Stable Diffusion, GLIDE, Control Net, IP-Adapter, Lo RA, LCM, SDXL-Lightning, Grounding-DINO, CLIP, YOLOv4, and GPT-4, but it does not specify their version numbers for reproducibility.
Experiment Setup	Yes	Specifically, for SD1.5, We utilize the Adam W optimizer with a fixed learning rate of 1e-5 and train the model for 50,000 iterations with a batch size of 256.