Human-Guided Fair Classification for Natural Language Processing
Authors: Florian E. Dorner, Momchil Peychev, Nikola Konstantinov, Naman Goel, Elliott Ash, Martin Vechev
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results, based on a large dataset for online content moderation, show that in this context our pipeline effectively generates a set of candidate pairs that covers more diverse perturbations than existing word replacement based approaches and successfully leverages human feedback to verify and filter these candidate pairs. |
| Researcher Affiliation | Academia | 1ETH Zurich, 2MPI for Intelligent Systems, Tübingen, 3University of Oxford Correspondence to: florian.dorner@tuebingen.mpg.de |
| Pseudocode | No | The paper describes methods in text but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide code to reproduce our generation pipeline and our experiments on synthetic data, as well as our dataset of human fairness judgments at https://github.com/eth-sri/ fairness-feedback-nlp |
| Open Datasets | Yes | We focus on toxicity classification on the Jigsaw Civil Comments dataset2. The dataset contains around 2 million online comments s, as well as labels toxic(s) indicating the fraction of human labelers that considered comment s toxic. We define binary classification labels y(s) := toxic(s) > 0.5. A subset D of the Civil Comments dataset also contains labels Aj(s) that indicate the fraction of human labelers that think comment s mentions the demographic group j. We again define binary classification labels as yj(s) := Aj(s) > 0.5 for these comments, and use them to train our group-presence classifier c. We only consider the subset D D for which no nan-values are contained in the dataset, and the Ro BERTa-tokenized version of s does not exceed a length of 64 tokens. We furthermore split D into a training set containing 75% of D and a test set containing the other 25%. To build the pool Ce of candidate pairs, for word replacement and style transfer, we attempt to produce modified comments s j mentioning group j for each s D for all demographic groups j with yj(s) = 1 and all possible target groups j . For GPT-3, we use a subset of D due to limited resources. We then combine 42,500 randomly selected pairs (s, s ) with s in the training part of D for word replacement and style transfer each and a total of 15,000 pairs (s, s ) for our three GPT-3 approaches, to form the set of candidate constraints Ce. We similarly construct a set of test constraints of a fourth of Ce s size from the test portion of D . More technical details can be found in App. B. https://www.kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification/data |
| Dataset Splits | No | We furthermore split D into a training set containing 75% of D and a test set containing the other 25%. The paper explicitly defines training and test splits but does not mention a validation split percentage or specific partitioning for a validation set from the main dataset. |
| Hardware Specification | No | The paper mentions models like RoBERTa and BART, but does not specify the hardware (e.g., GPU/CPU models) used for training or inference. |
| Software Dependencies | Yes | All of our experiments involving transformer language models use the huggingface transformers library Wolf et al. (2020). We accessed GPT-3 using Open AI s API7. For our first approach, we used the "text-davinci-001" version of GPT3 in a zero-shot manner... The second approach was based on the beta-version of GPT-3 s editing mode 8. Here, s is produced using the model "text-davinci-edit-001"... |
| Experiment Setup | Yes | We train c for 3 epochs with a batch size of 16 and use the Adam optimizer Kingma & Ba (2015) with learning rate 0.00001 to optimize the binary Cross Entropy loss, reweighed by relative label frequency in the dataset. The BART-based generator g is trained starting from the pretrained facebook/bart-large model for a single epoch with batch size 4, again using Adam and a learning rate of 0.00001. We used temperature = 0.7 and top_p= 1 in all our approaches and used max_tokens= 64 for "text-davinci-001" to control the length of the modified sentence s. |