FuzzE: Fuzzy Fairness Evaluation of Offensive Language Classifiers on African-American English
Authors: Anthony Rios881-889
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To measure these problems, we need text written in both AAVE and Standard American English (SAE). Specifically, we propose an automated fairness fuzzing tool called Fuzz E to quantify the fairness of text classifiers applied to AAVE text using a dataset that only contains text written in SAE. Overall, we find that the fairness estimates returned by our technique moderately correlates with the use of real ground-truth AAVE text. We conduct a detailed analysis of the framework using automatic style transfer evaluation metrics. Moreover, we measure the increase of well-known phonetic and syntactic AAVE constructions produced by different style transfer techniques after being applied to SAE text. We also perform a human evaluation study to measure semantic change (e.g., offensive to not-offensive) encountered by transforming the style of text. |
| Researcher Affiliation | Academia | Anthony Rios Department of Information Systems and Cyber Security University of Texas at San Antonio anthony.rios@utsa.edu |
| Pseudocode | No | The paper describes the methods and workflow in prose and with diagrams (Figure 1), but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology (Fuzz E, style transfer methods) is publicly available. |
| Open Datasets | Yes | AAVE Dataset (Style Data). Blodgett, Green, and O Connor (2016) originally collected and released more than 59.2 million tweets by 2.8 million users. Offensive Language Datasets. We investigate style transfer and fairness evaluation using two datasets: The Offensive Language Identification Dataset (OLID) (Zampieri et al. 2019) and the Hate Speech and Offensive Language (HSOL) Dataset (Davidson et al. 2017). |
| Dataset Splits | No | The SAE tweets in both datasets are split into a training (80%) and test set (20%). While the paper mentions using bootstrap sampling from the training split for creating multiple models, it does not specify a distinct validation set or its split proportion. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions various models and tools used (e.g., Logistic Regression, CNN, Bi-LSTM, Twokenizer, Ken LM) and cites relevant papers, but it does not provide explicit version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python 3.8, PyTorch 1.9, scikit-learn X.Y). |
| Experiment Setup | Yes | Using cross-validation, the regularization parameter is optimized for each dataset independently. We found the best regularization parameters for OLID and HSOL to be 0.1 and 1.0, respectively. For the model specification of the generator and encoder, we use a twolayer Bi-LSTM with a word embedding size of 300 and hidden dimension size of 500. The generator will create a max sequence of 50 tokens. The CNN classifier is trained with 100 filters that span 5 words. |