The Role of ImageNet Classes in Fréchet Inception Distance

Authors: Tuomas Kynkäänniemi, Tero Karras, Miika Aittala, Timo Aila, Jaakko Lehtinen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate a root cause of these discrepancies, and visualize what FID looks at in generated images. We show that the feature space that FID is (typically) computed in is so close to the Image Net classifications that aligning the histograms of Top-N classifications between sets of generated and real images can reduce FID substantially without actually improving the quality of results. Thus, we conclude that FID is prone to intentional or accidental distortions.
Researcher Affiliation Collaboration Tuomas Kynk a anniemi Aalto University tuomas.kynkaanniemi@aalto.fi Tero Karras NVIDIA tkarras@nvidia.com Miika Aittala NVIDIA maittala@aalto.fi Timo Aila NVIDIA taila@aalto.fi Jaakko Lehtinen Aalto University & NVIDIA jlehtinen@nvidia.com
Pseudocode Yes Algorithm 1 shows the pseudocode for our resampling method.
Open Source Code Yes Code is available at https://github.com/kynkaat/role-of-imagenet-classes-in-fid.
Open Datasets Yes We study this in the same context as Sauer et al. (2021) by training a Projected Fast GAN (Liu et al., 2021; Sauer et al., 2021) that uses an Image Net pre-trained Efficient Net (Tan & Le, 2019) as a feature extractor of the discriminator, and compare it against Style GAN2 in FFHQ.6
Dataset Splits Yes Following standard practice, we compute FID against the training set, using 50k randomly chosen real and generated images and the official Tensor Flow version of Inception-V3.2
Hardware Specification Yes We use 32GB NVIDIA Tesla V100 GPU to run our resampling experiments.
Software Dependencies No The paper mentions using 'Tensor Flow' and 'Py Torch' frameworks and specific models/codebases like 'Inception-V3', 'Res Net-50', 'CLIP', 'Style GAN2', but it does not provide specific version numbers for these software dependencies (e.g., 'PyTorch 1.9').
Experiment Setup Yes In our experiments, we use Style GAN2 auto-config trained in 256 x 256 resolution without adaptive discriminator augmentation (ADA). The only exception is AFHQ-V2 DOG, where we enable ADA and train in 512 x 512 resolution. [...] We use learning rate α = 10.0 when optimizing pre-logits features and α = 5.0 when optimizing logits or binarized class probabilities. We optimize the weights until convergence, which typically requires 100k iterations.