Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Authors: Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan, Peter Clark, Ashish Sabharwal, Tushar Khot
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our study covers 24 reasoning datasets (spanning mathematics, law, medicine, morals, and more), 4 LLMs (2 versions of Chat GPT-3.5, GPT4-Turbo, and Llama-2-70b-chat), and 19 diverse personas (e.g., an Asian person ) spanning 5 socio-demographic groups: race, gender, religion, disability, and political affiliation. Our experiments unveil that LLMs harbor deep rooted bias against various socio-demographics underneath a veneer of fairness. |
| Researcher Affiliation | Collaboration | 1Allen Institute for AI 2Stanford University 3Princeton University |
| Pseudocode | No | The paper does not include pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | Code and model outputs: https://allenai.github.io/persona-bias. |
| Open Datasets | Yes | We select 24 datasets from MMLU (Hendrycks et al., 2021), Big-Bench-Hard (Suzgun et al., 2022), and MBPP (Austin et al., 2021) to evaluate the knowledge and reasoning abilities of LLMs in diverse domains. |
| Dataset Splits | Yes | For all datasets, we make use of the official test partitions in our evaluations. |
| Hardware Specification | No | To fit such a model within our GPUs, we use the AWQ quantized (Lin et al., 2023) model from Hugging Face (The Bloke/Llama-2-70b-Chat-AWQ). |
| Software Dependencies | Yes | We primarily focus on Chat GPT-3.5 (gpt-3.5-turbo-0613) as it has demonstrated impressive persona-following (Park et al., 2023) and reasoning (Qin et al., 2023) abilities. We also experimented with the latest release (Nov. 2023) of Chat GPT-3.5 (gpt-3.5-turbo-1106), GPT-4Turbo (gpt-4-turbo-1106), and Llama-2-70b-chat, and include their results in Appendix D. |
| Experiment Setup | Yes | We use a max token length of 1024, temperature 0, and a top-p value of 1 (equivalent to greedy decoding). |