Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks

Authors: Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei, Aounon Kumar, Atoosa Chegini, Wenxiao Wang, Soheil Feizi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate our theoretical findings, we also provide empirical evidence demonstrating that diffusion purification effectively removes low perturbation budget watermarks by applying minimal changes to images. Finally, we extend our theory to characterize a fundamental trade-off between the robustness and reliability of classifier-based deep fake detectors and demonstrate it through experiments.
Researcher Affiliation Academia Mehrdad Saberi1, Vinu Sankar Sadasivan1, Keivan Rezaei1, Aounon Kumar1, Atoosa Chegini1, Wenxiao Wang1, Soheil Feizi1 1Department of Computer Science, University of Maryland {msaberi,vinu,krezaei,aounon,atoocheg,wwx,sfeizi}@umd.edu
Pseudocode Yes We provide the pseudocode for spoofing watermarks in Algorithm 1.
Open Source Code Yes Code is available at https://github.com/mehrdadsaberi/watermark robustness.
Open Datasets Yes Our evaluation is conducted on a set of 100 images drawn from the Image Net dataset (Russakovsky et al., 2015), and their watermarked counterparts using each method. We perform experiments on the images from the Face Forensics++ dataset hosted by R ossler et al. (2019) to verify our theoretical insights empirically.
Dataset Splits Yes Our substitute classifiers are trained for 10 epochs and receive higher than 99.8% accuracy on validation data. After preprocessing, our Face Swap image dataset contains 4316 (1059, respectively) original and 3529 (1857, respectively) manipulated images in the training (test, respectively) dataset. Similarly, our Deep Fakes image dataset contains 4316 (1059, respectively) original and 3522 (1843, respectively) manipulated images in the training (test, respectively) dataset.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory, or processor types with speeds) used for running its experiments. While experiments imply the use of computational resources, no concrete specifications are listed.
Software Dependencies No The paper mentions the use of models (e.g., ResNet-18, VGG-16-BN, Image Net pretrained models) and general frameworks (e.g., diffusion models), but does not list specific version numbers for any software dependencies, libraries, or frameworks required to reproduce the experiments (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Our substitute classifiers are trained for 10 epochs and receive higher than 99.8% accuracy on validation data. To launch adversarial attacks on images using substitute classifiers, we employ a PGD attack with 300 iterations and a step size denoted as α = 0.05ϵ. We train different detectors with the standard deviation of noise σ varied from 0 to 20.