Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FairImagen: Post-Processing for Bias Mitigation in Text-to-Image Models

Authors: Zihao Fu, Ryan Brown, Shun Shao, Kai Rawal, Eoin Delaney, Chris Russell

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across gender, race, and intersectional settings demonstrate that Fair Imagen significantly improves fairness with a moderate trade-off in image quality and prompt fidelity. Our framework outperforms existing post-hoc methods and offers a simple, scalable, and model-agnostic solution for equitable text-to-image generation.
Researcher Affiliation	Academia	Zihao Fu The Chinese University of Hong Kong EMAIL Ryan Brown University of Oxford EMAIL Shun Shao University of Cambridge EMAIL Kai Rawal University of Oxford EMAIL Eoin Delaney Trinity College Dublin EMAIL Chris Russell University of Oxford EMAIL
Pseudocode	No	The paper describes the framework and methodology in detail within the main text (e.g., sections 3.1, 3.2, 3.3, 3.4), but it does not include a formally structured pseudocode block or algorithm.
Open Source Code	Yes	1 https://github.com/fuzihaofzh/Fair Imagen
Open Datasets	Yes	The Winobias dataset consists of 46 professions, collected from the US Bureau of Labor Statistics, that are stereotypically considered male biased or female biased [31, 30, 34, 35]. In our experiments, we extend this list to 120 professions using publicly available lists2. ... 2 The full list is included in our supplementary material with the code. We manually extended the winobias list using a publicly available list of occupations from Wikipedia: https://en.wikipedia.org/wiki/Lists_of_occupations
Dataset Splits	Yes	We split the dataset into a development set of 20 samples and use the remaining 100 samples as the test set.
Hardware Specification	Yes	We run all the models on a NVIDIA A100 GPU with 80 GB memory.
Software Dependencies	No	The paper mentions extending "Hugging Face s Stable Diffusion3Pipeline" and using "CLIP text encoder", "T5 [46]", and "Open CLIP [47]", but it does not specify exact version numbers for these software components, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	Our modified pipeline extends Hugging Face s Stable Diffusion3Pipeline to accept external embeddings and apply Fair Imgen at inference time. We generate images using classifier-free guidance with scale 𝑤= 7.0 and 𝑇= 28 diffusion steps. Images are generated in batches (12 per prompt), stitched, and evaluated with fairness and perceptual quality metrics. We split the dataset into a development set of 20 samples and use the remaining 100 samples as the test set. We tune all models on the development set to maximize the average (AVG) score and report their performance on the test set.