Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation

Authors: Tianyi Liang, Jiangqi Liu, Yifei Huang, Shiqi Jiang, Jianshen Shi, Changbo Wang, Chenhui Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluated on our proposed text-friendly T2I benchmark of 27,000 images across four seed datasets, Text Cen Gen outperforms existing methods by achieving 23% lower saliency overlap in text regions while maintaining 98% of the semantic ﬁdelity measured by CLIP score and our proposed Visual-Textual Concordance Metric (VTCM).
Researcher Affiliation	Academia	1College of Computer Science and Technology, East China Normal University, Shanghai, China 2Shanghai Institute of AI Education, Shanghai, China 3Shanghai Artiﬁcial Intelligence Laboratory, Shanghai, China. Correspondence to: Chenhui Li <EMAIL>.
Pseudocode	No	The paper describes the method and its components (Force-Directed Cross-Attention Guidance, Spatial Excluding Cross-Attention Constraint) in detail with mathematical formulas and explanations. However, it does not include a distinct block explicitly labeled as 'Pseudocode' or 'Algorithm', nor does it present structured, code-like steps that would qualify as pseudocode.
Open Source Code	Yes	1Open source code at: https://github.com/ tianyilt/Text Cen Gen_Background_Adapt
Open Datasets	Yes	Evaluated on our proposed text-friendly T2I benchmark of 27,000 images across four seed datasets ...Prompt2Prompt template (Hertz et al., 2022) ... Diffusion DB prompts (Wang et al., 2022) ... a targeted Desigen benchmark using 771 images from the Desigen dataset validation set (Weng et al., 2024)
Dataset Splits	Yes	Our evaluation contains 27,000 images generated from 2,700 unique prompts, each tested in ten different random region R. ...a targeted Desigen benchmark using 771 images from the Desigen dataset validation set (Weng et al., 2024)
Hardware Specification	Yes	We use one A6000 and ten A40 GPUs for evaluation. Our cross-attention replacement method requires less than 15GB of VRAM, making it feasible to run inference on consumer GPUs like the RTX 3090. For evaluation purposes, we utilized one NVIDIA A6000 and 8 H800 GPUs.
Software Dependencies	No	Our model is built with Diffusers. The pre-trained models are stable-diffusion-v1-5 and stable-diffusion-v2-0. Our proposed model is designed using the Diffusers library, speciﬁcally leveraging the stable-diffusion-v1-5 pre-trained models with DDPM Scheduler. The paper specifies specific versions for the pre-trained models (e.g., stable-diffusion-v1-5) and mentions the Diffusers library and DDPM Scheduler, but does not provide specific version numbers for the Diffusers library itself.
Experiment Setup	Yes	While generating, the size of the output images is 512 × 512. ... we have set the force balance constant α to 0.5. The coefﬁcients for regularization term γ is ﬁxed at 0.01. Within the detector, we upscale all cross-attention maps to a 64 × 64 resolution. Additionally, we expand the height and width of the region R by a margin of 0.06. During the ﬁrst 20 steps, we identify conﬂicting objects when the Intersection over Union (IOU) exceeds 0.14. For the subsequent 30 steps, we initiate a push operation only if the average density inside region R surpasses 0.8. Negative prompts are monocolor, monotony, cartoon style, many texts, pure cloud, pure sea, extra texts, texts, monochrome, ﬂattened, lowres, longbody, bad anatomy, bad hands, missing ﬁngers, extra digit, fewer digits, cropped, worst quality, low quality . For our experiments, we set the sampling steps to 50 and the classiﬁer-free guidance scale at 7.5.