Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation
Authors: Kai Chen, Enze Xie, Zhe Chen, Yibo Wang, Lanqing HONG, Zhenguo Li, Dit-Yan Yeung
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate GEODIFFUSION outperforms previous L2I methods while maintaining 4 training time faster. |
| Researcher Affiliation | Collaboration | 1Hong Kong University of Science and Technology 2Huawei Noah s Ark Lab 3Nanjing University 4Tsinghua University |
| Pseudocode | No | The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper provides a 'Project Page' link (https://kaichen1998.github.io/projects/geodiffusion/), but it does not contain an unambiguous statement that the source code for the methodology is openly released or a direct link to a code repository. |
| Open Datasets | Yes | Our experiments primarily utilize the widely used Nu Images (Caesar et al., 2020) dataset, which consists of 60K training samples and 15K validation samples with high-quality bounding box annotations from 10 semantic classes. Moreover, to showcase the universality of GEODIFFUSION for common layout-to-image settings, we present experimental results on COCO (Lin et al., 2014; Caesar et al., 2018). |
| Dataset Splits | Yes | Our experiments primarily utilize the widely used Nu Images (Caesar et al., 2020) dataset, which consists of 60K training samples and 15K validation samples with high-quality bounding box annotations from 10 semantic classes. |
| Hardware Specification | Yes | We gratefully acknowledge the support of the Mind Spore, CANN (Compute Architecture for Neural Networks) and Ascend AI Processor used for this research. |
| Software Dependencies | Yes | We initialize the embedding matrix of the location tokens with 2D sine-cosine embeddings (Vaswani et al., 2017), while the remaining parameters of GEODIFFUSION are initialized with Stable Diffusion (v1.5), a pre-trained text-to-image diffusion model based on LDM (Rombach et al., 2022). |
| Experiment Setup | Yes | The batch size is set to 64, and learning rates are set to 4e 5 for U-Net and 3e 5 for the text encoder. Layer-wise learning rate decay (Clark et al., 2020) is further adopted for the text encoder, with a decay ratio of 0.95. With 10% probability, the text prompt is replaced with a null text for unconditional generation. We fine-tune our GEODIFFUSION for 64 epochs, while baseline methods are trained for 256 epochs to maintain a similar training budget with the COCO recipe in (Sun & Wu, 2019; Li et al., 2021; Jahn et al., 2021). During generation, we sample images using the PLMS (Liu et al., 2022a) scheduler for 100 steps with the classifier-free guidance (CFG) set as 5.0. |