Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Entropy Rectifying Guidance for Diffusion and Flow Models

Authors: Tariq Berrada, Adriana Romero-Soriano, Michal Drozdzal, Jakob J. Verbeek, Karteek Alahari

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that ERG results in significant improvements in various tasks, including text-to-image, class-conditional and unconditional image generation. We also show that ERG can be seamlessly combined with other recent guidance methods such as CADS and APG, further improving generation results. ... 4 Experimental evaluation
Researcher Affiliation	Collaboration	1 FAIR at Meta 2 Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, France 3 Mc Gill University 4 Mila, Quebec AI institute 5 Canada CIFAR AI chair
Pseudocode	Yes	Algorithm 1 Entropy rectifying guidance
Open Source Code	Yes	By releasing our code transparently, we provide a way for researchers to study and counter the potential harmful effects of our method being misused, allowing for the development of defense strategies.
Open Datasets	Yes	We use a face-blurred version of Image Net (Deng et al., 2009) to train class-conditional models at 256 and 512 resolution... For the text-to-image model, we use an architecture similar to MMDi T (Esser et al., 2024), and train a 512 resolution model on a mix of a proprietary dataset of 320M text-image pairs and YFCC100M (Thomee et al., 2016)... For evaluation of text-to-image and unconditional generation, we use the 40k COCO 14 validation image-caption pairs.
Dataset Splits	Yes	For evaluation of text-to-image and unconditional generation, we use the 40k COCO 14 validation image-caption pairs. For the class-conditional models, we sample 50 images for each of the 1,000 Image Net classes and use the Image Net validation set as a reference.
Hardware Specification	No	The paper does not explicitly mention specific hardware details like GPU models, CPU types, or cloud instance specifications used for running the experiments.
Software Dependencies	No	The paper mentions using models like Llama3-8B and Flan-T5-XL, and libraries such as Eval GIM, but it does not specify explicit version numbers for software components (e.g., PyTorch 1.x, Python 3.x) which are required for a reproducible description of ancillary software.
Experiment Setup	Yes	All evaluated models are sampled using the Euler method with 50 sampling steps. We use the Eval GIM (Hall et al., 2024) library for all evaluations. Baselines. In addition to the standard classiﬁer-free guidance, we compare our method to several recent state-of-the-art guidance techniques: Condition-Annealed Diffusion Sampler (CADS) (Sadat et al., 2024), Adaptive Projected Guidance (APG) (Sadat et al., 2025), Smooth Energy Guidance (SEG) (Hong, 2024), and Auto-Guidance (Karras et al., 2024). For APG, we follow the recommendations from the paper and set γAPG = 0.5, ηAPG = 0.0, r APG = 5.0. For CADS, we perform a grid search over τ CADS 1 [0.6, 0.8], τ CADS 2 [0.8, 1.0], s CADS [0.25, 1.0], ψCADS = 1.0. ... We used K = γ = 1 in our default setup in our experiments, unless speciﬁed otherwise.