reproducibilityindex.ai

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation

Authors: Kai Chen, Enze Xie, Zhe Chen, Yibo Wang, Lanqing HONG, Zhenguo Li, Dit-Yan Yeung

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate GEODIFFUSION outperforms previous L2I methods while maintaining 4 training time faster.
Researcher Affiliation	Collaboration	1Hong Kong University of Science and Technology 2Huawei Noah s Ark Lab 3Nanjing University 4Tsinghua University
Pseudocode	No	The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper provides a 'Project Page' link (https://kaichen1998.github.io/projects/geodiffusion/), but it does not contain an unambiguous statement that the source code for the methodology is openly released or a direct link to a code repository.
Open Datasets	Yes	Our experiments primarily utilize the widely used Nu Images (Caesar et al., 2020) dataset, which consists of 60K training samples and 15K validation samples with high-quality bounding box annotations from 10 semantic classes. Moreover, to showcase the universality of GEODIFFUSION for common layout-to-image settings, we present experimental results on COCO (Lin et al., 2014; Caesar et al., 2018).
Dataset Splits	Yes	Our experiments primarily utilize the widely used Nu Images (Caesar et al., 2020) dataset, which consists of 60K training samples and 15K validation samples with high-quality bounding box annotations from 10 semantic classes.
Hardware Specification	Yes	We gratefully acknowledge the support of the Mind Spore, CANN (Compute Architecture for Neural Networks) and Ascend AI Processor used for this research.
Software Dependencies	Yes	We initialize the embedding matrix of the location tokens with 2D sine-cosine embeddings (Vaswani et al., 2017), while the remaining parameters of GEODIFFUSION are initialized with Stable Diffusion (v1.5), a pre-trained text-to-image diffusion model based on LDM (Rombach et al., 2022).
Experiment Setup	Yes	The batch size is set to 64, and learning rates are set to 4e 5 for U-Net and 3e 5 for the text encoder. Layer-wise learning rate decay (Clark et al., 2020) is further adopted for the text encoder, with a decay ratio of 0.95. With 10% probability, the text prompt is replaced with a null text for unconditional generation. We fine-tune our GEODIFFUSION for 64 epochs, while baseline methods are trained for 256 epochs to maintain a similar training budget with the COCO recipe in (Sun & Wu, 2019; Li et al., 2021; Jahn et al., 2021). During generation, we sample images using the PLMS (Liu et al., 2022a) scheduler for 100 steps with the classifier-free guidance (CFG) set as 5.0.