reproducibilityindex.ai

An Examination of Fairness of AI Models for Deepfake Detection

Authors: Loc Trinh, Yan Liu

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we evaluate bias present in deepfake datasets and detection models across protected subgroups. Using facial datasets balanced by race and gender, we examine three popular deepfake detectors and find large disparities in predictive performances across races, with up to 10.7% difference in error rate between subgroups.
Researcher Affiliation	Academia	Loc Trinh , Yan Liu Department of Computer Science, University of Southern California {loctrinh, yanliu.cs}@usc.edu
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any links to open-source code for the methodology described in the paper or explicitly state that the code is released.
Open Datasets	Yes	We trained Meso Inception4 [Afchar et al., 2018], Xception [R ossler et al., 2019], and Face X-Ray [Li et al., 2020] on the Face Forensics++ dataset, which contains four variants of face swaps. ... Auditing Datasets We utilized two face datasets labeled with demographic information: (1) Racial Face-in-the-Wild (RFW) [Wang et al., 2019] and (2) UTKFace [Zhang et al., 2017].
Dataset Splits	No	The paper mentions 'Since the Face Forensics++ training dataset is heavily imbalanced, we set the threshold as the value in the range (0.01, 0.99, 0.01) that maximizes the balanced accuracy on the Faceforensics++ validation set.' However, it does not provide specific training/validation/test dataset splits (e.g., percentages or exact counts) for reproducibility of the overall experiments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	Since the Face Forensics++ training dataset is heavily imbalanced, we set the threshold as the value in the range (0.01, 0.99, 0.01) that maximizes the balanced accuracy on the Faceforensics++ validation set.