An Examination of Fairness of AI Models for Deepfake Detection

Authors: Loc Trinh, Yan Liu

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we evaluate bias present in deepfake datasets and detection models across protected subgroups. Using facial datasets balanced by race and gender, we examine three popular deepfake detectors and find large disparities in predictive performances across races, with up to 10.7% difference in error rate between subgroups.
Researcher Affiliation Academia Loc Trinh , Yan Liu Department of Computer Science, University of Southern California {loctrinh, yanliu.cs}@usc.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any links to open-source code for the methodology described in the paper or explicitly state that the code is released.
Open Datasets Yes We trained Meso Inception4 [Afchar et al., 2018], Xception [R ossler et al., 2019], and Face X-Ray [Li et al., 2020] on the Face Forensics++ dataset, which contains four variants of face swaps. ... Auditing Datasets We utilized two face datasets labeled with demographic information: (1) Racial Face-in-the-Wild (RFW) [Wang et al., 2019] and (2) UTKFace [Zhang et al., 2017].
Dataset Splits No The paper mentions 'Since the Face Forensics++ training dataset is heavily imbalanced, we set the threshold as the value in the range (0.01, 0.99, 0.01) that maximizes the balanced accuracy on the Faceforensics++ validation set.' However, it does not provide specific training/validation/test dataset splits (e.g., percentages or exact counts) for reproducibility of the overall experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Since the Face Forensics++ training dataset is heavily imbalanced, we set the threshold as the value in the range (0.01, 0.99, 0.01) that maximizes the balanced accuracy on the Faceforensics++ validation set.