UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation

Authors: Hanzhang Zhou, Zijian Feng, Zixiao Zhu, Junlang Qian, Kezhi Mao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across 12 NLP datasets demonstrate that Uni Bias significantly enhances ICL performance and alleviates prompt brittleness of LLMs.
Researcher Affiliation Academia Hanzhang Zhou1,2, Zijian Feng1,2, Zixiao Zhu1,2, Junlang Qian1, Kezhi Mao1,2 1Nanyang Technological University 2Singapore-ETH Centre {hanzhang001, feng0119, zixiao001, junlang001}@e.ntu.edu.sg ekzmao@ntu.edu.sg
Pseudocode No The paper does not contain a structured pseudocode or algorithm block.
Open Source Code Yes The code is available at https://github.com/hzzhou01/Uni Bias.
Open Datasets Yes We evaluate our Uni Bias method on 12 diverse natural language processing datasets across various tasks, including sentiment analysis, topic classification, natural language inference, reasoning, and word disambiguation. Statistics and details about the datasets can be found in Table 4 in Appendix. Table 4 lists: SST2 [Socher et al., 2013], MNLI [Williams et al., 2018], Wi C [Pilehvar and Camacho-Collados, 2019], COPA [Roemmele et al., 2011], CR [Hu and Liu, 2004], AGNews [Zhang et al., 2015], MR [Pang and Lee, 2005], RTE [Dagan et al., 2005], SST-5 [Socher et al., 2013], TREC [Voorhees and Tice, 2000], ARC-Challenge [Clark et al., 2018], MMLU [Hendrycks et al., 2020].
Dataset Splits Yes In our experiments, we utilize k (where k = 0, 1, 2, 4) training samples per class as prompt examples for k-shot ICL. For testing, we randomly select 2000 samples for MMLU and 3000 samples for MNLI and MR, while employing the original testing sets for other datasets. Additionally, Specifically, we utilize a small subset of training data as a support set, with 20 samples for each class.
Hardware Specification Yes Additionally, we conduct the experiment on four A5000 GPUs.
Software Dependencies No The paper mentions models like Llama-2 and GPT-J, but does not provide specific version numbers for software dependencies (e.g., Python, deep learning frameworks, or other libraries).
Experiment Setup Yes For all experiments, unless stated otherwise, we use 1-shot ICL setting, i.e. one example per class, and repeat five times under different random seeds. We use k = 20 sampes per class as the support set to obtain all threshold values by grid searching, as mentioned in the method section. The prompt template and more implementation details are specified in Appendix A.