A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks

Authors: Hoin Jung, Taeuk Jang, Xiaoqian Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate SFID s effectiveness across various VLMs tasks including zero-shot classification, text-to-image retrieval, image captioning, and text-to-image generation, by significantly reducing gender biases without compromising performance.
Researcher Affiliation Academia Hoin Jung, Taeuk Jang, Xiaoqian Wang Elmore Family School of Electrical and Computer Engineering Purdue University West Lafayette, IN 47907 {jung414, jang141, joywang}@purdue.edu
Pseudocode Yes Algorithm 1 Selective Feature Imputation for Debiasing (SFID) Input: Frozen representation of debiasing training and validation dataset, (ZD, y D) and (ZV D). Representation of query set in the downstream task, ZQ. Output: Debiased representation in downstream task, Z Q
Open Source Code Yes The code is available on Git Hub.
Open Datasets Yes We utilize the FACET [17] dataset, which includes 49,551 images across 52 classes with gender sensitive attribute. ... we utilize the Flickr30K [42] dataset which includes ground truth captions and gender attributes. ... The MS-COCO dataset [9] is used as the query dataset ... datasets like Fair Face [22] for image inputs and Bias-in-Bios [12] for text inputs are employed.
Dataset Splits Yes Each dataset is split into training and validation sets. ... low confidence imputation (LCI) is defined as the average of the features in low-confidence samples from the validation set as determined by Random Forest. ... Data used for debiasing 20,000 (training), 10,000 (imputation value) from Fair Face
Hardware Specification Yes CPU AMD EPYC 7313 16-Core Processor GPU NVIDIA RTX A5000
Software Dependencies No The paper states 'All hyperparameters and model settings for each baseline follow the default configurations provided in their respective open-source repositories.' but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes All hyperparameters and model settings for each baseline follow the default configurations provided in their respective open-source repositories. Detailed experimental settings, along with evaluation metrics and query datasets, are described in Section 3. ... The number of pruned feature k is set as 50 by choosing an elbow point of the feature importance described in Appendix A.1. Moreover, the impact of a hyperparameter τ for thresholding low confidence samples is studied in Section 5.3.