Fourier Spectrum Discrepancies in Deep Network Generated Images

Authors: Tarik Dzanic, Karan Shah, Freddie Witherden

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we present an analysis of the high-frequency Fourier modes of real and deep network generated images and show that deep network generated images share an observable, systematic shortcoming in replicating the attributes of these high-frequency modes. Using this, we propose a detection method based on the frequency spectrum of the images which is able to achieve an accuracy of up to 99.2% in classifying real and deep network generated images... Furthermore, we show the impact of image transformations... and suggest a method for modifying... The results of the KNN classifier for image resolutions of 10242, 7682 (cropped), and 2562 with compression qualities of 100% (uncompressed), 95%, and 85% are shown in Table 2.
Researcher Affiliation Academia Tarik Dzanic Department of Ocean Engineering Texas A&M University College Station, TX 77843 tdzanic@tamu.edu Karan Shah Department of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA 30332 shah@gatech.edu Freddie D. Witherden Department of Ocean Engineering Texas A&M University College Station, TX 77843 fdw@tamu.edu
Pseudocode No No structured pseudocode or algorithm blocks were found. The paper describes the classification pipeline in numbered list format in Section 2.3.2, but it is not formatted as pseudocode.
Open Source Code No No explicit statement about the release of source code or a link to a code repository for the methodology described in the paper was found.
Open Datasets Yes Image samples were taken from datasets of real images and images generated by Style GAN [1], Style GAN2 [2], PGGAN [3], VQ-VAE2 [4] and ALAE [5] architectures... These datasets, shown in Table 1, are denoted by R, G, S, P, V, and A, respectively, with the subscript denoting the resolution. Examples from Table 1: R1024 FFHQ Faces, G1024 Karras et al. [1] Faces, V1024 Razavi et al. [4] Faces, R256 Zhang et al. [19] Cats.
Dataset Splits Yes For the majority of the datasets, 10% of the images were used for training while the remaining 90% were used for testing to highlight the relatively low number of training examples required for classification. For the high-resolution VQ-VAE2 datasets (V1024/V768), only a small number of high-resolution images were presented in the work by Razavi et al. [4], and therefore only 8 images were available for training and 9 for testing.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments were mentioned in the paper.
Software Dependencies No The paper mentions 'lossy JPEG compression with Python Imaging Library (Pillow)' but does not provide specific version numbers for Pillow or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes A k-nearest neighbors (KNN) classifier with k = 5 was used for classification between real and deep network generated images with respect to the decay parameters (b1, b2) of the grayscale component of the images. A comparison of the reduced spectrum statistics of the grayscale-converted 10242 pixel images from the datasets in Table 1 is shown in Fig. 2, normalized by the spectrum at a threshold wavenumber k_T = 0.75.