Discovering Bias in Latent Space: An Unsupervised Debiasing Approach

Authors: Dyah Adila, Shuai Zhang, Boran Han, Bernie Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically verify the effectiveness of STEERFAIR with regard to: Mitigating Order Bias (Section 4.1): Our unsupervised method mitigates model s tendency to choose options at specific positions and is competitive to (and sometimes outperforms) supervised methods (Li et al., 2023). Generalization Capability (Section 4.2): We show that the bias direction identified by STEERFAIR is generalizable across datasets with the same task.
Researcher Affiliation Collaboration 1Department of Computer Science, University of Wisconsin Madison 2Amazon Web Services.
Pseudocode Yes Algorithm 1 Identifying bias direction with STEERFAIR
Open Source Code No The paper mentions using and adapting code from other repositories (e.g., "ITI (Li et al., 2023) code is adapted from the author’s original repository", "We use IDEFICS and Instruct BLIP models from Hugging Face... and LLa VA from the author’s repository"), but it does not explicitly state that the source code for STEERFAIR, the methodology described in this paper, is open-source or provide a link to it.
Open Datasets Yes We test our method on three Multiple Choice Question (MCQ) and yes/no question-answering datasets: Science QA (Lu et al., 2022), MME Benchmark (Fu et al., 2023), and Visual Genome Relation (VGR) (Lin et al., 2014).
Dataset Splits Yes For VGR, we randomly split the dataset 80:20 train/validation:test split, and randomly sample 1000 samples from the train split to find bias direction.
Hardware Specification Yes We use 8 Test V100 GPUs for hyperparameter tuning and evaluation.
Software Dependencies No The paper mentions using various models and platforms like "LLa VA from the author’s repository (Liu et al., 2023b)", "IDEFICS from Huggingface (Wolf et al., 2020)", and "Instruct BLIP", but it does not specify exact version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other key software libraries necessary for replication.
Experiment Setup Yes We perform hyperparameter search for both STEERFAIR and ITI. We list the hyperparameter search space in Table 6. Method Intervention strength α number of heads K ITI {1, 5, 10, 15, 20, 25, 30, 40, 50} {1, 10, 20, 30, 40, 50, 100} STEERFAIR {0.1, 0.5, 1, 2, 5, 10, 15, 20, 25} {10, 30, 50, 100, 200, 500}