Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Leveraging Diffusion Perturbations for Measuring Fairness in Computer Vision
Authors: Nicholas Lui, Bryan Chia, William Berrios, Candace Ross, Douwe Kiela
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using this dataset, we benchmark several vision-language models on a multiclass occupation classification task. We find that images generated with non-Caucasian labels have a significantly higher occupation misclassification rate than images generated with Caucasian labels, and that several misclassifications are suggestive of racial biases. |
| Researcher Affiliation | Collaboration | 1Stanford University 2Contextual AI 3Meta AI |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | We plan to release our code under a permissive license at this link: github.com/niclui/diffusion-perturbations. |
| Open Datasets | Yes | To enable greater exploration of our work, we release our gen-erated dataset at this link: bit.ly/occupation-dataset. |
| Dataset Splits | No | The paper does not specify explicit training/validation/test splits for the generated dataset, as its primary experiment is evaluating pre-trained models rather than training new ones on its own dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions models like Stable Diffusion, Vi LT-B/32 VQA, CLIP, and FLAVA but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch versions). |
| Experiment Setup | No | The paper states 'We discuss our choice of hyperparameters in Appendix A1,' indicating that specific experimental setup details like hyperparameters are not included in the main text. |