reproducibilityindex.ai

A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks

Authors: Hoin Jung, Taeuk Jang, Xiaoqian Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results demonstrate SFID s effectiveness across various VLMs tasks including zero-shot classification, text-to-image retrieval, image captioning, and text-to-image generation, by significantly reducing gender biases without compromising performance.
Researcher Affiliation	Academia	Hoin Jung, Taeuk Jang, Xiaoqian Wang Elmore Family School of Electrical and Computer Engineering Purdue University West Lafayette, IN 47907 {jung414, jang141, joywang}@purdue.edu
Pseudocode	Yes	Algorithm 1 Selective Feature Imputation for Debiasing (SFID) Input: Frozen representation of debiasing training and validation dataset, (ZD, y D) and (ZV D). Representation of query set in the downstream task, ZQ. Output: Debiased representation in downstream task, Z Q
Open Source Code	Yes	The code is available on Git Hub.
Open Datasets	Yes	We utilize the FACET [17] dataset, which includes 49,551 images across 52 classes with gender sensitive attribute. ... we utilize the Flickr30K [42] dataset which includes ground truth captions and gender attributes. ... The MS-COCO dataset [9] is used as the query dataset ... datasets like Fair Face [22] for image inputs and Bias-in-Bios [12] for text inputs are employed.
Dataset Splits	Yes	Each dataset is split into training and validation sets. ... low confidence imputation (LCI) is defined as the average of the features in low-confidence samples from the validation set as determined by Random Forest. ... Data used for debiasing 20,000 (training), 10,000 (imputation value) from Fair Face
Hardware Specification	Yes	CPU AMD EPYC 7313 16-Core Processor GPU NVIDIA RTX A5000
Software Dependencies	No	The paper states 'All hyperparameters and model settings for each baseline follow the default configurations provided in their respective open-source repositories.' but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	All hyperparameters and model settings for each baseline follow the default configurations provided in their respective open-source repositories. Detailed experimental settings, along with evaluation metrics and query datasets, are described in Section 3. ... The number of pruned feature k is set as 50 by choosing an elbow point of the feature importance described in Appendix A.1. Moreover, the impact of a hyperparameter τ for thresholding low confidence samples is studied in Section 5.3.