reproducibilityindex.ai

Mitigating Test-Time Bias for Fair Image Retrieval

Authors: Fanjie Kong, Shuai Yuan, Weituo Hao, Ricardo Henao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our algorithm on real-world image search datasets, Occupation 1 and 2, as well as two large-scale image-text datasets, MS-COCO and Flickr30k. Our approach achieves the lowest bias, compared with various existing bias-mitigation methods, in text-based image retrieval result while maintaining satisfactory retrieval performance.
Researcher Affiliation	Collaboration	Fanjie Kong Duke University fanjie.kong@duke.edu Shuai Yuan Duke University shuai@cs.duke.edu Weituo Hao Tik Tok Inc. weituohao@tiktok.com Ricardo Henao Duke University KAUST ricardo.henao@duke.edu
Pseudocode	Yes	Algorithm 1 Post-hoc Bias Mitigation (PBM).
Open Source Code	Yes	The source code is publicly available at https://github.com/timqqt/Fair_Text_ based_Image_Retrieval.
Open Datasets	Yes	For these two datasets, we consider Open AI s CLIP Vi T-B/16 (Radford et al., 2021) as the VL model for all debiasing methods. The first dataset, which we refer to as Occupation 1 (Kay et al., 2015), comprises the top 100 Google image search results for 45 gender-neutral occupation terms... Occupation 2 (Celis and Keswani, 2020), the second dataset, includes the top 100 Google image search results for 96 occupations... We consider MS-COCO (Lin et al., 2014) and Flickr30k (Plummer et al., 2015). Our setup aligns with Wang et al. (2021a), where the gender attributes are directly inferred from the text captions of images.
Dataset Splits	Yes	The first large-scale image-text dataset is MS-COCO captions dataset, which is partitioned into 113,287 training images, 5,000 validation images, and 5,000 test images. The second large-scale image-text dataset employed in our experiment is Flickr30K, which contains 31,000 images obtained from Flickr. Adhering to the partitioning scheme presented in Plummer et al. (2015), we allocate 1,000 images each for validation and testing, with the remaining images designated for training.
Hardware Specification	Yes	All of our experiments ran on one NVIDIA TITAN Xp 12GB GPU with CUDA version 11.5.
Software Dependencies	Yes	All of our experiments ran on one NVIDIA TITAN Xp 12GB GPU with CUDA version 11.5. ... Further, each selection is solved by the GUROBI solver (Gurobi Optimization, LLC, 2023).
Experiment Setup	Yes	For adversarial learning, the trade-off is controlled by adjusting the adversarial loss weights between 0 and 1.0. In MI-clip, we modify the clipped dimensions from 10 to 500 (CLIP output dimension is 512). Regarding PBM methods, a trade-off parameter is introduced via a stochastic variable 𝜃, which denotes the likelihood of choosing a fair subset at any given time, instead of simply opting for the image with the top similarity score. ... The image classifier is a 3-layer multi-layer perceptron (MLP) as shown in Table 5, that takes the image representation from the original CLIP as input.