reproducibilityindex.ai

The Role of ImageNet Classes in Fréchet Inception Distance

Authors: Tuomas Kynkäänniemi, Tero Karras, Miika Aittala, Timo Aila, Jaakko Lehtinen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We investigate a root cause of these discrepancies, and visualize what FID looks at in generated images. We show that the feature space that FID is (typically) computed in is so close to the Image Net classifications that aligning the histograms of Top-N classifications between sets of generated and real images can reduce FID substantially without actually improving the quality of results. Thus, we conclude that FID is prone to intentional or accidental distortions.
Researcher Affiliation	Collaboration	Tuomas Kynk a anniemi Aalto University tuomas.kynkaanniemi@aalto.fi Tero Karras NVIDIA tkarras@nvidia.com Miika Aittala NVIDIA maittala@aalto.fi Timo Aila NVIDIA taila@aalto.fi Jaakko Lehtinen Aalto University & NVIDIA jlehtinen@nvidia.com
Pseudocode	Yes	Algorithm 1 shows the pseudocode for our resampling method.
Open Source Code	Yes	Code is available at https://github.com/kynkaat/role-of-imagenet-classes-in-fid.
Open Datasets	Yes	We study this in the same context as Sauer et al. (2021) by training a Projected Fast GAN (Liu et al., 2021; Sauer et al., 2021) that uses an Image Net pre-trained Efficient Net (Tan & Le, 2019) as a feature extractor of the discriminator, and compare it against Style GAN2 in FFHQ.6
Dataset Splits	Yes	Following standard practice, we compute FID against the training set, using 50k randomly chosen real and generated images and the official Tensor Flow version of Inception-V3.2
Hardware Specification	Yes	We use 32GB NVIDIA Tesla V100 GPU to run our resampling experiments.
Software Dependencies	No	The paper mentions using 'Tensor Flow' and 'Py Torch' frameworks and specific models/codebases like 'Inception-V3', 'Res Net-50', 'CLIP', 'Style GAN2', but it does not provide specific version numbers for these software dependencies (e.g., 'PyTorch 1.9').
Experiment Setup	Yes	In our experiments, we use Style GAN2 auto-config trained in 256 x 256 resolution without adaptive discriminator augmentation (ADA). The only exception is AFHQ-V2 DOG, where we enable ADA and train in 512 x 512 resolution. [...] We use learning rate α = 10.0 when optimizing pre-logits features and α = 5.0 when optimizing logits or binarized class probabilities. We optimize the weights until convergence, which typically requires 100k iterations.