Fourier Spectrum Discrepancies in Deep Network Generated Images
Authors: Tarik Dzanic, Karan Shah, Freddie Witherden
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present an analysis of the high-frequency Fourier modes of real and deep network generated images and show that deep network generated images share an observable, systematic shortcoming in replicating the attributes of these high-frequency modes. Using this, we propose a detection method based on the frequency spectrum of the images which is able to achieve an accuracy of up to 99.2% in classifying real and deep network generated images... Furthermore, we show the impact of image transformations... and suggest a method for modifying... The results of the KNN classifier for image resolutions of 10242, 7682 (cropped), and 2562 with compression qualities of 100% (uncompressed), 95%, and 85% are shown in Table 2. |
| Researcher Affiliation | Academia | Tarik Dzanic Department of Ocean Engineering Texas A&M University College Station, TX 77843 tdzanic@tamu.edu Karan Shah Department of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA 30332 shah@gatech.edu Freddie D. Witherden Department of Ocean Engineering Texas A&M University College Station, TX 77843 fdw@tamu.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. The paper describes the classification pipeline in numbered list format in Section 2.3.2, but it is not formatted as pseudocode. |
| Open Source Code | No | No explicit statement about the release of source code or a link to a code repository for the methodology described in the paper was found. |
| Open Datasets | Yes | Image samples were taken from datasets of real images and images generated by Style GAN [1], Style GAN2 [2], PGGAN [3], VQ-VAE2 [4] and ALAE [5] architectures... These datasets, shown in Table 1, are denoted by R, G, S, P, V, and A, respectively, with the subscript denoting the resolution. Examples from Table 1: R1024 FFHQ Faces, G1024 Karras et al. [1] Faces, V1024 Razavi et al. [4] Faces, R256 Zhang et al. [19] Cats. |
| Dataset Splits | Yes | For the majority of the datasets, 10% of the images were used for training while the remaining 90% were used for testing to highlight the relatively low number of training examples required for classification. For the high-resolution VQ-VAE2 datasets (V1024/V768), only a small number of high-resolution images were presented in the work by Razavi et al. [4], and therefore only 8 images were available for training and 9 for testing. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions 'lossy JPEG compression with Python Imaging Library (Pillow)' but does not provide specific version numbers for Pillow or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | A k-nearest neighbors (KNN) classifier with k = 5 was used for classification between real and deep network generated images with respect to the decay parameters (b1, b2) of the grayscale component of the images. A comparison of the reduced spectrum statistics of the grayscale-converted 10242 pixel images from the datasets in Table 1 is shown in Fig. 2, normalized by the spectrum at a threshold wavenumber k_T = 0.75. |