An Examination of Fairness of AI Models for Deepfake Detection
Authors: Loc Trinh, Yan Liu
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we evaluate bias present in deepfake datasets and detection models across protected subgroups. Using facial datasets balanced by race and gender, we examine three popular deepfake detectors and find large disparities in predictive performances across races, with up to 10.7% difference in error rate between subgroups. |
| Researcher Affiliation | Academia | Loc Trinh , Yan Liu Department of Computer Science, University of Southern California {loctrinh, yanliu.cs}@usc.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any links to open-source code for the methodology described in the paper or explicitly state that the code is released. |
| Open Datasets | Yes | We trained Meso Inception4 [Afchar et al., 2018], Xception [R ossler et al., 2019], and Face X-Ray [Li et al., 2020] on the Face Forensics++ dataset, which contains four variants of face swaps. ... Auditing Datasets We utilized two face datasets labeled with demographic information: (1) Racial Face-in-the-Wild (RFW) [Wang et al., 2019] and (2) UTKFace [Zhang et al., 2017]. |
| Dataset Splits | No | The paper mentions 'Since the Face Forensics++ training dataset is heavily imbalanced, we set the threshold as the value in the range (0.01, 0.99, 0.01) that maximizes the balanced accuracy on the Faceforensics++ validation set.' However, it does not provide specific training/validation/test dataset splits (e.g., percentages or exact counts) for reproducibility of the overall experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Since the Face Forensics++ training dataset is heavily imbalanced, we set the threshold as the value in the range (0.01, 0.99, 0.01) that maximizes the balanced accuracy on the Faceforensics++ validation set. |