Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DEXTER: Diffusion-Guided EXplanations with TExtual Reasoning for Vision Models

Authors: Simone Carnemolla, Matteo Pennisi, Sarinda Samarasinghe, Giovanni Bellitto, Simone Palazzo, Daniela Giordano, Mubarak Shah, Concetto Spampinato

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Quantitative and qualitative evaluations, including a user study, show that DEXTER produces accurate, interpretable outputs. Experiments on Image Net, Waterbirds, Celeb A, and Fair Faces confirm that DEXTER outperforms existing approaches in global model explanation and class-level bias reporting.
Researcher Affiliation Academia 1University of Catania 2University of Central Florida EMAIL EMAIL
Pseudocode Yes B DEXTER Algorithm Algorithm 1 DEXTER
Open Source Code Yes Code is available at https://github.com/perceivelab/dexter.
Open Datasets Yes Experiments on Image Net, Waterbirds, Celeb A, and Fair Faces confirm that DEXTER outperforms existing approaches in global model explanation and class-level bias reporting.
Dataset Splits No The paper uses standard datasets such as Image Net, Waterbirds, Celeb A, and Fair Faces, and references external training schemes like the debiased training scheme from [35], but it does not explicitly state the training, validation, or test split percentages or sample counts for the experiments conducted in this paper.
Hardware Specification Yes All experiments ran in half-precision on three H100 GPUs.
Software Dependencies Yes We adopt CLIP as the text encoder and Stable Diffusion v1.4 2 as the diffusion model. To reduce inference time, we employ the Latent Consistency Model (LCM) Lo RA adapter 3 using 4 inference steps. Hugging Face Stable Diffusion id: compvis/stable-diffusion-v1-4. Hugging Face Lo RA id: latent-consistency/lcm-lora-sdv1-5.
Experiment Setup Yes DEXTER is trained with a batch size of 1 (i.e., one image per iteration) and a learning rate of 0.1 across all tasks. ... for multi-word optimization, the fixed prompt is a picture of a [MASK] with [MASK] and [MASK] and [MASK] and [MASK] and [MASK]. We set the sequence P of soft prompts p to 1... The temperature τ for the Gumbel softmax is kept at its default value of 1.0. ... We used 1000 DEXTER optimization steps and single word prompting... For bias reasoning, we generate 50 images... over the course of up to 5,000 optimization steps... temperature of 0.2... max tokens parameter is set to 0... top_p is fixed at 1.0, while both the frequency_penalty and the presence_penalty are set to 0.0.