reproducibilityindex.ai

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

Authors: Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan, Peter Clark, Ashish Sabharwal, Tushar Khot

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our study covers 24 reasoning datasets (spanning mathematics, law, medicine, morals, and more), 4 LLMs (2 versions of Chat GPT-3.5, GPT4-Turbo, and Llama-2-70b-chat), and 19 diverse personas (e.g., an Asian person ) spanning 5 socio-demographic groups: race, gender, religion, disability, and political affiliation. Our experiments unveil that LLMs harbor deep rooted bias against various socio-demographics underneath a veneer of fairness.
Researcher Affiliation	Collaboration	1Allen Institute for AI 2Stanford University 3Princeton University
Pseudocode	No	The paper does not include pseudocode or a clearly labeled algorithm block.
Open Source Code	Yes	Code and model outputs: https://allenai.github.io/persona-bias.
Open Datasets	Yes	We select 24 datasets from MMLU (Hendrycks et al., 2021), Big-Bench-Hard (Suzgun et al., 2022), and MBPP (Austin et al., 2021) to evaluate the knowledge and reasoning abilities of LLMs in diverse domains.
Dataset Splits	Yes	For all datasets, we make use of the official test partitions in our evaluations.
Hardware Specification	No	To fit such a model within our GPUs, we use the AWQ quantized (Lin et al., 2023) model from Hugging Face (The Bloke/Llama-2-70b-Chat-AWQ).
Software Dependencies	Yes	We primarily focus on Chat GPT-3.5 (gpt-3.5-turbo-0613) as it has demonstrated impressive persona-following (Park et al., 2023) and reasoning (Qin et al., 2023) abilities. We also experimented with the latest release (Nov. 2023) of Chat GPT-3.5 (gpt-3.5-turbo-1106), GPT-4Turbo (gpt-4-turbo-1106), and Llama-2-70b-chat, and include their results in Appendix D.
Experiment Setup	Yes	We use a max token length of 1024, temperature 0, and a top-p value of 1 (equivalent to greedy decoding).