LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models
Authors: Tianci Liu, Haoyu Wang, Shiyang Wang, Yu Cheng, Jing Gao
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three LMs ranging from 0.7B to 7B parameters demonstrate the superiority of our method. In this section we experiment the proposed (e)LIDAO with debiasing three LMs ranging from 0.7B to 7B parameters on three tasks. |
| Researcher Affiliation | Academia | 1Purdue University 2The Chinese University of Hong Kong. Correspondence to: Yu Cheng <chengyu@cse.cuhk.edu.hku>, Jing Gao <jinggao@purdue.edu>. |
| Pseudocode | Yes | Algorithm 1 (e)LIDAO algorithm |
| Open Source Code | No | The paper does not provide a direct link to a source code repository or an explicit statement about the release of code for the described methodology. |
| Open Datasets | Yes | We focus on a set of paired adversarial prompts released by Yang et al. (2023) that encourages toxic and biased texts because of high quality and challenging nature. This dataset consist of 175 pairs that are handcrafted from a 1K subset of the Real Toxicity Prompts (Gehman et al., 2020). |
| Dataset Splits | No | The paper does not provide specific details about training, validation, and test dataset splits for the models or data used in their experiments. It describes the dataset used for evaluation but not explicit splits for model training or validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions 'Adam optimizer (Kingma & Ba, 2014)' and 'Nucleus Sampling (Holtzman et al., 2019)' but does not provide specific version numbers for software dependencies like programming languages or deep learning frameworks (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | In both UDDIA and (e)LIDAO, we omitted the redo mechanism and applied the bias-tuning to the top 18 layers. Following Yang et al. (2023), we take one gradient descent step with the Adam optimizer (Kingma & Ba, 2014). Table 6 reports detailed hyper-parameters. All algorithms used the Nucleus Sampling (Holtzman et al., 2019). Table 6: Sampling [Probability Coverage, Temperature, Repetition Penalty], Bias-Tuning [Learning Rate, Top Layers to Tune], Mixed Weight [τ] with specific values for GPT-2, OPT, and Falcon. |