Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Information Theoretic Text-to-Image Alignment
Authors: Chao Wang, Giulio Franzese, alessandro finamore, Massimo Gallo, Pietro Michiardi
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analysis indicates that our method is superior to the state-of-the-art, yet it only requires the pre-trained denoising network of the T2I model itself to estimate MI, and a simple finetuning strategy that improves alignment while maintaining image quality. Code available at https://github.com/Chao0511/mitune. [...] We perform an extensive experimental campaign using a recent T2I benchmark suite (Huang et al., 2023) and SD-2.1-base as base model obtaining sizable improvement compared to six alternative methods ( 4). |
| Researcher Affiliation | Collaboration | Chao Wang1,2, Giulio Franzese1, Alessandro Finamore2, Massimo Gallo2, Pietro Michiardi1 EURECOM1, Huawei Technologies SASU, France2 1EMAIL 2EMAIL |
| Pseudocode | Yes | Algorithm 1: MI-TUNE [...] Algorithm 2: Point-wise MI Estimation |
| Open Source Code | Yes | Code available at https://github.com/Chao0511/mitune. |
| Open Datasets | Yes | We compare all techniques using T2I-Comp Bench (Huang et al., 2023), a benchmark composed of 700/300 (train/test) prompts across 6 categories [...] We also assess MI-TUNE performance on more realistic prompts by sampling 5,000/1,250 (train/test) prompt-image pairs from Diffusion DB (Wang et al., 2022) [...] we compute the metrics using 30k samples of the MS-COCO-2014 (Lin et al., 2015) validation set. |
| Dataset Splits | Yes | We compare all techniques using T2I-Comp Bench (Huang et al., 2023), a benchmark composed of 700/300 (train/test) prompts across 6 categories [...] We also assess MI-TUNE performance on more realistic prompts by sampling 5,000/1,250 (train/test) prompt-image pairs from Diffusion DB (Wang et al., 2022) |
| Hardware Specification | Yes | GPUs for Training 1 NVIDIA A100 [...] on a single A100-80GB GPU |
| Software Dependencies | No | Table 8: Training hyperparameters. Trainable model UNET [...] PEFT Do RA (Liu et al., 2024) Rank 32 α 32 [...] Optimizer Adam W. This table lists training components and techniques, not specific software library versions like Python, PyTorch, or CUDA, which are required for a "Yes" answer. |
| Experiment Setup | Yes | Table 8: Training hyperparameters. Trainable model UNET [...] PEFT Do RA (Liu et al., 2024) Rank 32 α 32 Learning rate (LR) 1e 4 Gradient norm clipping 1.0 LR scheduler Constant LR warmup steps 0 Optimizer Adam W Adam W β1 0.9 Adam W β2 0.999 Adam W weight decay 1e 2 Adam W ϵ 1e 8 Resolution 512 512 Classifier-free guidance scale 7.5 Denoising steps 50 Batch size 400 Training iterations 300 |