A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks
Authors: Hoin Jung, Taeuk Jang, Xiaoqian Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate SFID s effectiveness across various VLMs tasks including zero-shot classification, text-to-image retrieval, image captioning, and text-to-image generation, by significantly reducing gender biases without compromising performance. |
| Researcher Affiliation | Academia | Hoin Jung, Taeuk Jang, Xiaoqian Wang Elmore Family School of Electrical and Computer Engineering Purdue University West Lafayette, IN 47907 {jung414, jang141, joywang}@purdue.edu |
| Pseudocode | Yes | Algorithm 1 Selective Feature Imputation for Debiasing (SFID) Input: Frozen representation of debiasing training and validation dataset, (ZD, y D) and (ZV D). Representation of query set in the downstream task, ZQ. Output: Debiased representation in downstream task, Z Q |
| Open Source Code | Yes | The code is available on Git Hub. |
| Open Datasets | Yes | We utilize the FACET [17] dataset, which includes 49,551 images across 52 classes with gender sensitive attribute. ... we utilize the Flickr30K [42] dataset which includes ground truth captions and gender attributes. ... The MS-COCO dataset [9] is used as the query dataset ... datasets like Fair Face [22] for image inputs and Bias-in-Bios [12] for text inputs are employed. |
| Dataset Splits | Yes | Each dataset is split into training and validation sets. ... low confidence imputation (LCI) is defined as the average of the features in low-confidence samples from the validation set as determined by Random Forest. ... Data used for debiasing 20,000 (training), 10,000 (imputation value) from Fair Face |
| Hardware Specification | Yes | CPU AMD EPYC 7313 16-Core Processor GPU NVIDIA RTX A5000 |
| Software Dependencies | No | The paper states 'All hyperparameters and model settings for each baseline follow the default configurations provided in their respective open-source repositories.' but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | All hyperparameters and model settings for each baseline follow the default configurations provided in their respective open-source repositories. Detailed experimental settings, along with evaluation metrics and query datasets, are described in Section 3. ... The number of pruned feature k is set as 50 by choosing an elbow point of the feature importance described in Appendix A.1. Moreover, the impact of a hyperparameter τ for thresholding low confidence samples is studied in Section 5.3. |