Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
MABR: Multilayer Adversarial Bias Removal Without Prior Bias Knowledge
Authors: Maxwell J. Yin, Boyu Wang, Charles Ling
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on racial and gender biases in sentiment and occupation classification tasks, our method effectively reduces social biases without the need for demographic annotations. Moreover, our approach not only matches but often surpasses the efficacy of methods that require detailed demographic insights, marking a significant advancement in bias mitigation techniques. We conduct experiments on two English NLP tasks and two types of social demographics: sentiment analysis with gender and occupation classification with race. Our MABR method successfully reduces bias, sometimes even outperforming methods that use demographic information. |
| Researcher Affiliation | Academia | Maxwell J. Yin, Boyu Wang, Charles Ling* University of Western Ontario EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Adversarial Training with Bias Detection and Mitigation |
| Open Source Code | Yes | Code https://github.com/maxwellyin/MABR |
| Open Datasets | Yes | Following the methodology of previous research (Elazar and Goldberg 2018; Orgad and Belinkov 2023), we employ a dataset from Blodgett, Green, and O Connor (2016) that consists of 100,000 tweets to explore dialect differences in social media language. This dataset allows us to analyze racial identity by categorizing each tweet as either African American English (AAE) or Mainstream US English (MUSE), commonly referred to as Standard American English (SAE). The classification leverages the geographical information of the tweet authors. Additionally, Elazar and Goldberg (2018) used emojis embedded in tweets as sentiment indicators to facilitate the sentiment classification task. Following previous research (Orgad and Belinkov 2023), we utilize the dataset provided by De-Arteaga et al. (2019), which comprises 400,000 online biographies, to examine gender bias in occupational classification. |
| Dataset Splits | No | The paper describes the datasets used and the training process, including batch size and learning rates, but it does not specify the exact percentages or counts for training, validation, or test splits for these datasets. It mentions model selection is performed without a validation set with demographic annotations, implying a split might be implicit or standard for the datasets, but no explicit details are given. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions the use of BERT and DeBERTa-v1 as backbone models and the Huggingface Transformers library for implementation. |
| Software Dependencies | No | The paper mentions "We implement the MABR framework using the Huggingface Transformers library (Wolf et al. 2020)", but it does not specify the version number of this library or any other software dependencies like Python, PyTorch/TensorFlow, or CUDA. |
| Experiment Setup | Yes | The batch size is set to 64, enabling dynamic adversarial training per batch. We set the learning rate to 1e-3 for the bias detector and domain classifier, and 2e-5 for the model. The threshold τ is selected to ensure approximately 30% of samples fall outside it after initial training. For training epochs, we balance task accuracy and fairness using the distance to optimum (DTO) criterion introduced by Han, Baldwin, and Cohn (2022). Model selection is performed without a validation set with demographic annotations, choosing the largest epoch while limiting accuracy reduction. We use 0.98 of the maximum achieved accuracy on the task as the threshold to stop training. Other hyperparameters follow the default settings provided by the Transformers library. |