Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Improved Precision and Recall Metric for Assessing Generative Models
Authors: Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, Timo Aila
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our metric in Style GAN and Big GAN by providing several illustrative examples where existing metrics yield uninformative or contradictory results. Furthermore, we analyze multiple design variants of Style GAN to better understand the relationships between the model architecture, training methods, and the properties of the resulting sample distribution. We demonstrate the effectiveness of our metric using two recent generative models (Section 3), Style GAN [12] and Big GAN [4]. |
| Researcher Affiliation | Collaboration | Tuomas Kynkäänniemi Aalto University NVIDIA EMAIL Tero Karras NVIDIA EMAIL Samuli Laine NVIDIA EMAIL Jaakko Lehtinen Aalto University NVIDIA EMAIL Timo Aila NVIDIA EMAIL |
| Pseudocode | Yes | See Appendix A in the supplement for pseudocode. |
| Open Source Code | Yes | Source code of our metric is available at https://github.com/kynkaat/improved-precision-and-recall-metric. |
| Open Datasets | Yes | We examine two state-of-the-art generative models, Style GAN [12] trained with the FFHQ dataset, and Big GAN [4] trained on Image Net [5]. |
| Dataset Splits | No | The paper discusses training and testing of models, but it does not explicitly provide details about a validation dataset split (e.g., percentages, sample counts, or explicit mention of a validation set). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU or GPU models used for running the experiments. |
| Software Dependencies | No | The paper references various models and frameworks (e.g., VGG-16, Inception-v3), but it does not list any specific software dependencies or libraries with version numbers required to replicate the experiments. |
| Experiment Setup | Yes | Thus we use k = 3 and |Φ| = 50000 in all our experiments unless stated otherwise. We use Style GAN [12] in all experiments, trained with FFHQ at 1024 1024. reducing the γ parameter by 100 shifts the balance even further (C). |