Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation
Authors: Tianyi Liang, Jiangqi Liu, Yifei Huang, Shiqi Jiang, Jianshen Shi, Changbo Wang, Chenhui Li
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluated on our proposed text-friendly T2I benchmark of 27,000 images across four seed datasets, Text Cen Gen outperforms existing methods by achieving 23% lower saliency overlap in text regions while maintaining 98% of the semantic fidelity measured by CLIP score and our proposed Visual-Textual Concordance Metric (VTCM). |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, East China Normal University, Shanghai, China 2Shanghai Institute of AI Education, Shanghai, China 3Shanghai Artificial Intelligence Laboratory, Shanghai, China. Correspondence to: Chenhui Li <EMAIL>. |
| Pseudocode | No | The paper describes the method and its components (Force-Directed Cross-Attention Guidance, Spatial Excluding Cross-Attention Constraint) in detail with mathematical formulas and explanations. However, it does not include a distinct block explicitly labeled as 'Pseudocode' or 'Algorithm', nor does it present structured, code-like steps that would qualify as pseudocode. |
| Open Source Code | Yes | 1Open source code at: https://github.com/ tianyilt/Text Cen Gen_Background_Adapt |
| Open Datasets | Yes | Evaluated on our proposed text-friendly T2I benchmark of 27,000 images across four seed datasets ...Prompt2Prompt template (Hertz et al., 2022) ... Diffusion DB prompts (Wang et al., 2022) ... a targeted Desigen benchmark using 771 images from the Desigen dataset validation set (Weng et al., 2024) |
| Dataset Splits | Yes | Our evaluation contains 27,000 images generated from 2,700 unique prompts, each tested in ten different random region R. ...a targeted Desigen benchmark using 771 images from the Desigen dataset validation set (Weng et al., 2024) |
| Hardware Specification | Yes | We use one A6000 and ten A40 GPUs for evaluation. Our cross-attention replacement method requires less than 15GB of VRAM, making it feasible to run inference on consumer GPUs like the RTX 3090. For evaluation purposes, we utilized one NVIDIA A6000 and 8 H800 GPUs. |
| Software Dependencies | No | Our model is built with Diffusers. The pre-trained models are stable-diffusion-v1-5 and stable-diffusion-v2-0. Our proposed model is designed using the Diffusers library, specifically leveraging the stable-diffusion-v1-5 pre-trained models with DDPM Scheduler. The paper specifies specific versions for the pre-trained models (e.g., stable-diffusion-v1-5) and mentions the Diffusers library and DDPM Scheduler, but does not provide specific version numbers for the Diffusers library itself. |
| Experiment Setup | Yes | While generating, the size of the output images is 512 × 512. ... we have set the force balance constant α to 0.5. The coefficients for regularization term γ is fixed at 0.01. Within the detector, we upscale all cross-attention maps to a 64 × 64 resolution. Additionally, we expand the height and width of the region R by a margin of 0.06. During the first 20 steps, we identify conflicting objects when the Intersection over Union (IOU) exceeds 0.14. For the subsequent 30 steps, we initiate a push operation only if the average density inside region R surpasses 0.8. Negative prompts are monocolor, monotony, cartoon style, many texts, pure cloud, pure sea, extra texts, texts, monochrome, flattened, lowres, longbody, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality . For our experiments, we set the sampling steps to 50 and the classifier-free guidance scale at 7.5. |