LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models

Authors: Tianci Liu, Haoyu Wang, Shiyang Wang, Yu Cheng, Jing Gao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on three LMs ranging from 0.7B to 7B parameters demonstrate the superiority of our method. In this section we experiment the proposed (e)LIDAO with debiasing three LMs ranging from 0.7B to 7B parameters on three tasks.
Researcher Affiliation Academia 1Purdue University 2The Chinese University of Hong Kong. Correspondence to: Yu Cheng <chengyu@cse.cuhk.edu.hku>, Jing Gao <jinggao@purdue.edu>.
Pseudocode Yes Algorithm 1 (e)LIDAO algorithm
Open Source Code No The paper does not provide a direct link to a source code repository or an explicit statement about the release of code for the described methodology.
Open Datasets Yes We focus on a set of paired adversarial prompts released by Yang et al. (2023) that encourages toxic and biased texts because of high quality and challenging nature. This dataset consist of 175 pairs that are handcrafted from a 1K subset of the Real Toxicity Prompts (Gehman et al., 2020).
Dataset Splits No The paper does not provide specific details about training, validation, and test dataset splits for the models or data used in their experiments. It describes the dataset used for evaluation but not explicit splits for model training or validation.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions 'Adam optimizer (Kingma & Ba, 2014)' and 'Nucleus Sampling (Holtzman et al., 2019)' but does not provide specific version numbers for software dependencies like programming languages or deep learning frameworks (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes In both UDDIA and (e)LIDAO, we omitted the redo mechanism and applied the bias-tuning to the top 18 layers. Following Yang et al. (2023), we take one gradient descent step with the Adam optimizer (Kingma & Ba, 2014). Table 6 reports detailed hyper-parameters. All algorithms used the Nucleus Sampling (Holtzman et al., 2019). Table 6: Sampling [Probability Coverage, Temperature, Repetition Penalty], Bias-Tuning [Learning Rate, Top Layers to Tune], Mixed Weight [τ] with specific values for GPT-2, OPT, and Falcon.