Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

Authors: Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan, Peter Clark, Ashish Sabharwal, Tushar Khot

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our study covers 24 reasoning datasets (spanning mathematics, law, medicine, morals, and more), 4 LLMs (2 versions of Chat GPT-3.5, GPT4-Turbo, and Llama-2-70b-chat), and 19 diverse personas (e.g., an Asian person ) spanning 5 socio-demographic groups: race, gender, religion, disability, and political affiliation. Our experiments unveil that LLMs harbor deep rooted bias against various socio-demographics underneath a veneer of fairness.
Researcher Affiliation Collaboration 1Allen Institute for AI 2Stanford University 3Princeton University
Pseudocode No The paper does not include pseudocode or a clearly labeled algorithm block.
Open Source Code Yes Code and model outputs: https://allenai.github.io/persona-bias.
Open Datasets Yes We select 24 datasets from MMLU (Hendrycks et al., 2021), Big-Bench-Hard (Suzgun et al., 2022), and MBPP (Austin et al., 2021) to evaluate the knowledge and reasoning abilities of LLMs in diverse domains.
Dataset Splits Yes For all datasets, we make use of the official test partitions in our evaluations.
Hardware Specification No To fit such a model within our GPUs, we use the AWQ quantized (Lin et al., 2023) model from Hugging Face (The Bloke/Llama-2-70b-Chat-AWQ).
Software Dependencies Yes We primarily focus on Chat GPT-3.5 (gpt-3.5-turbo-0613) as it has demonstrated impressive persona-following (Park et al., 2023) and reasoning (Qin et al., 2023) abilities. We also experimented with the latest release (Nov. 2023) of Chat GPT-3.5 (gpt-3.5-turbo-1106), GPT-4Turbo (gpt-4-turbo-1106), and Llama-2-70b-chat, and include their results in Appendix D.
Experiment Setup Yes We use a max token length of 1024, temperature 0, and a top-p value of 1 (equivalent to greedy decoding).