Debiasing Algorithm through Model Adaptation

Authors: Tomasz Limisiewicz, David Mareček, Tomáš Musil

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform causal analysis to identify problematic model components and discover that mid-upper feed-forward layers are most prone to convey bias. Based on the analysis results, we intervene in the model by applying a linear projection to the weight matrices of these layers. Our titular method DAMA significantly decreases bias as measured by diverse metrics while maintaining the model s performance on downstream tasks.
Researcher Affiliation Academia Tomasz Limisiewicz David Mareˇcek Tom aˇs Musil Faculty of Mathematics and Physics, Charles University {limisiewicz,marecek,musil}@ufal.mff.cuni.cz
Pseudocode No The paper describes its methodology in prose and mathematical equations but does not include any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code Yes We release code for our method and models, which retrain LLa MA s state-of-the-art performance while being significantly less biased.1 [...] 1The code available at: github.com/tomlimi/DAMA
Open Datasets Yes We use the set of professions chosen and annotated by Bolukbasi et al. (2016).2 [...] 2The data is available at: https://github.com/tolga-b/debiaswe/blob/master/data/professions.json [...] To evaluate the performance of the model s pre-training task, we measure perplexity on the Wikipedia 103 corpus (Merity et al., 2016) available through Hugging Face. [...] Wino Bias Zhao et al. (2018) [...] Stereo Set Nadeem et al. (2021) [...] Open Book QA (OBQA) (Mihaylov et al., 2018) [...] AI2 Reasoning Challenge (ARC) (Clark et al., 2018) [...] Massive Multitask Language Understanding (MMLU) (Hendrycks et al., 2021)
Dataset Splits No The paper mentions 'Test train split' for its custom dataset in Appendix C.1, specifying that a 'test set' is selected and the 'remainder... assigned to the train set.' However, it does not explicitly define a separate 'validation' split for this or any other dataset used in the experiments.
Hardware Specification No The paper mentions 'Due to hardware limitations, we could not run MMLU inference for 65B models.' (Table 2 footnote) but does not provide any specific details about the type or specifications of the hardware used for the experiments, such as CPU/GPU models or memory.
Software Dependencies No The paper mentions 'Huggingface library' and 'Adam scheduler (Kingma & Ba, 2015)' but does not specify version numbers for these or any other software dependencies, making it difficult to precisely reproduce the software environment.
Experiment Setup Yes We apply DAMA to MLPs in approximately one-third of the model s upper layers (in LLa MA 7B layers 21 29 out of 32 with projection dimensionality dc = 256). [...] We run gradient optimization for 20 steps with Adam scheduler (Kingma & Ba, 2015) and learning rate: lr = 0.5. We picked the following regularization constants: λ1 = 0.0625 and λ2 = 0.2. [...] In our implementation, we used factor r = 8 and learning rate lr = 0.0001. [...] Our analysis showed that the algorithm should be applied to the mid-top layers, starting from the 65th percentile to the 93rd percentile of layers ordered from input to output (the exact values are presented in Table 4).