Debiasing Algorithm through Model Adaptation
Authors: Tomasz Limisiewicz, David Mareček, Tomáš Musil
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform causal analysis to identify problematic model components and discover that mid-upper feed-forward layers are most prone to convey bias. Based on the analysis results, we intervene in the model by applying a linear projection to the weight matrices of these layers. Our titular method DAMA significantly decreases bias as measured by diverse metrics while maintaining the model s performance on downstream tasks. |
| Researcher Affiliation | Academia | Tomasz Limisiewicz David Mareˇcek Tom aˇs Musil Faculty of Mathematics and Physics, Charles University {limisiewicz,marecek,musil}@ufal.mff.cuni.cz |
| Pseudocode | No | The paper describes its methodology in prose and mathematical equations but does not include any explicit pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | We release code for our method and models, which retrain LLa MA s state-of-the-art performance while being significantly less biased.1 [...] 1The code available at: github.com/tomlimi/DAMA |
| Open Datasets | Yes | We use the set of professions chosen and annotated by Bolukbasi et al. (2016).2 [...] 2The data is available at: https://github.com/tolga-b/debiaswe/blob/master/data/professions.json [...] To evaluate the performance of the model s pre-training task, we measure perplexity on the Wikipedia 103 corpus (Merity et al., 2016) available through Hugging Face. [...] Wino Bias Zhao et al. (2018) [...] Stereo Set Nadeem et al. (2021) [...] Open Book QA (OBQA) (Mihaylov et al., 2018) [...] AI2 Reasoning Challenge (ARC) (Clark et al., 2018) [...] Massive Multitask Language Understanding (MMLU) (Hendrycks et al., 2021) |
| Dataset Splits | No | The paper mentions 'Test train split' for its custom dataset in Appendix C.1, specifying that a 'test set' is selected and the 'remainder... assigned to the train set.' However, it does not explicitly define a separate 'validation' split for this or any other dataset used in the experiments. |
| Hardware Specification | No | The paper mentions 'Due to hardware limitations, we could not run MMLU inference for 65B models.' (Table 2 footnote) but does not provide any specific details about the type or specifications of the hardware used for the experiments, such as CPU/GPU models or memory. |
| Software Dependencies | No | The paper mentions 'Huggingface library' and 'Adam scheduler (Kingma & Ba, 2015)' but does not specify version numbers for these or any other software dependencies, making it difficult to precisely reproduce the software environment. |
| Experiment Setup | Yes | We apply DAMA to MLPs in approximately one-third of the model s upper layers (in LLa MA 7B layers 21 29 out of 32 with projection dimensionality dc = 256). [...] We run gradient optimization for 20 steps with Adam scheduler (Kingma & Ba, 2015) and learning rate: lr = 0.5. We picked the following regularization constants: λ1 = 0.0625 and λ2 = 0.2. [...] In our implementation, we used factor r = 8 and learning rate lr = 0.0001. [...] Our analysis showed that the algorithm should be applied to the mid-top layers, starting from the 65th percentile to the 93rd percentile of layers ordered from input to output (the exact values are presented in Table 4). |