Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Information Theoretic Text-to-Image Alignment

Authors: Chao Wang, Giulio Franzese, alessandro finamore, Massimo Gallo, Pietro Michiardi

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our analysis indicates that our method is superior to the state-of-the-art, yet it only requires the pre-trained denoising network of the T2I model itself to estimate MI, and a simple finetuning strategy that improves alignment while maintaining image quality. Code available at https://github.com/Chao0511/mitune. [...] We perform an extensive experimental campaign using a recent T2I benchmark suite (Huang et al., 2023) and SD-2.1-base as base model obtaining sizable improvement compared to six alternative methods ( 4).
Researcher Affiliation	Collaboration	Chao Wang1,2, Giulio Franzese1, Alessandro Finamore2, Massimo Gallo2, Pietro Michiardi1 EURECOM1, Huawei Technologies SASU, France2 1EMAIL 2EMAIL
Pseudocode	Yes	Algorithm 1: MI-TUNE [...] Algorithm 2: Point-wise MI Estimation
Open Source Code	Yes	Code available at https://github.com/Chao0511/mitune.
Open Datasets	Yes	We compare all techniques using T2I-Comp Bench (Huang et al., 2023), a benchmark composed of 700/300 (train/test) prompts across 6 categories [...] We also assess MI-TUNE performance on more realistic prompts by sampling 5,000/1,250 (train/test) prompt-image pairs from Diffusion DB (Wang et al., 2022) [...] we compute the metrics using 30k samples of the MS-COCO-2014 (Lin et al., 2015) validation set.
Dataset Splits	Yes	We compare all techniques using T2I-Comp Bench (Huang et al., 2023), a benchmark composed of 700/300 (train/test) prompts across 6 categories [...] We also assess MI-TUNE performance on more realistic prompts by sampling 5,000/1,250 (train/test) prompt-image pairs from Diffusion DB (Wang et al., 2022)
Hardware Specification	Yes	GPUs for Training 1 NVIDIA A100 [...] on a single A100-80GB GPU
Software Dependencies	No	Table 8: Training hyperparameters. Trainable model UNET [...] PEFT Do RA (Liu et al., 2024) Rank 32 α 32 [...] Optimizer Adam W. This table lists training components and techniques, not specific software library versions like Python, PyTorch, or CUDA, which are required for a "Yes" answer.
Experiment Setup	Yes	Table 8: Training hyperparameters. Trainable model UNET [...] PEFT Do RA (Liu et al., 2024) Rank 32 α 32 Learning rate (LR) 1e 4 Gradient norm clipping 1.0 LR scheduler Constant LR warmup steps 0 Optimizer Adam W Adam W β1 0.9 Adam W β2 0.999 Adam W weight decay 1e 2 Adam W ϵ 1e 8 Resolution 512 512 Classifier-free guidance scale 7.5 Denoising steps 50 Batch size 400 Training iterations 300