Mitigating Test-Time Bias for Fair Image Retrieval

Authors: Fanjie Kong, Shuai Yuan, Weituo Hao, Ricardo Henao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our algorithm on real-world image search datasets, Occupation 1 and 2, as well as two large-scale image-text datasets, MS-COCO and Flickr30k. Our approach achieves the lowest bias, compared with various existing bias-mitigation methods, in text-based image retrieval result while maintaining satisfactory retrieval performance.
Researcher Affiliation Collaboration Fanjie Kong Duke University fanjie.kong@duke.edu Shuai Yuan Duke University shuai@cs.duke.edu Weituo Hao Tik Tok Inc. weituohao@tiktok.com Ricardo Henao Duke University KAUST ricardo.henao@duke.edu
Pseudocode Yes Algorithm 1 Post-hoc Bias Mitigation (PBM).
Open Source Code Yes The source code is publicly available at https://github.com/timqqt/Fair_Text_ based_Image_Retrieval.
Open Datasets Yes For these two datasets, we consider Open AI s CLIP Vi T-B/16 (Radford et al., 2021) as the VL model for all debiasing methods. The first dataset, which we refer to as Occupation 1 (Kay et al., 2015), comprises the top 100 Google image search results for 45 gender-neutral occupation terms... Occupation 2 (Celis and Keswani, 2020), the second dataset, includes the top 100 Google image search results for 96 occupations... We consider MS-COCO (Lin et al., 2014) and Flickr30k (Plummer et al., 2015). Our setup aligns with Wang et al. (2021a), where the gender attributes are directly inferred from the text captions of images.
Dataset Splits Yes The first large-scale image-text dataset is MS-COCO captions dataset, which is partitioned into 113,287 training images, 5,000 validation images, and 5,000 test images. The second large-scale image-text dataset employed in our experiment is Flickr30K, which contains 31,000 images obtained from Flickr. Adhering to the partitioning scheme presented in Plummer et al. (2015), we allocate 1,000 images each for validation and testing, with the remaining images designated for training.
Hardware Specification Yes All of our experiments ran on one NVIDIA TITAN Xp 12GB GPU with CUDA version 11.5.
Software Dependencies Yes All of our experiments ran on one NVIDIA TITAN Xp 12GB GPU with CUDA version 11.5. ... Further, each selection is solved by the GUROBI solver (Gurobi Optimization, LLC, 2023).
Experiment Setup Yes For adversarial learning, the trade-off is controlled by adjusting the adversarial loss weights between 0 and 1.0. In MI-clip, we modify the clipped dimensions from 10 to 500 (CLIP output dimension is 512). Regarding PBM methods, a trade-off parameter is introduced via a stochastic variable 𝜃, which denotes the likelihood of choosing a fair subset at any given time, instead of simply opting for the image with the top similarity score. ... The image classifier is a 3-layer multi-layer perceptron (MLP) as shown in Table 5, that takes the image representation from the original CLIP as input.