Attribute Based Interpretable Evaluation Metrics for Generative Models

Authors: Dongkyun Kim, Mingi Kwon, Youngjung Uh

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a series of carefully controlled experiments with varying configurations of attributes to validate our metrics in Section 5.2 and 5.3. Then we provide different characteristics of state-of-the-art generative models (Karras et al., 2019; 2020b; 2021; Sauer et al., 2021; Nichol & Dhariwal, 2021; Rombach et al., 2022; Yang et al., 2023) which could not be seen in the existing metrics.
Researcher Affiliation Collaboration 1Department of Artificial Intelligence, Yonsei University, Seoul, Republic of Korea 2AI Lab, CTO Division, LG Electronics, Seoul, Republic of Korea.
Pseudocode No No pseudocode or algorithm blocks explicitly labeled found.
Open Source Code Yes Code: github.com/notou10/sadpad .
Open Datasets Yes We conduct a series of carefully controlled experiments with varying configurations of attributes to validate our metrics in Section 5.2 and 5.3. Then we provide different characteristics of state-of-the-art generative models (Karras et al., 2019; 2020b; 2021; Sauer et al., 2021; Nichol & Dhariwal, 2021; Rombach et al., 2022; Yang et al., 2023) which could not be seen in the existing metrics. For instance, GANs better synthesize color-/texture-related attributes such as striped fur which DMs hardly preserve in LSUN-Cat (Section 5.4).
Dataset Splits No No explicit train/validation/test dataset splits specified.
Hardware Specification Yes We used a single NVIDIA RTX 3090 GPU (24GB) for the experiments.
Software Dependencies No Mentions `scipy.stats.gaussian_kde`, `spacy.load("en_core_web_sm")`, and `"Vi T-B/32"` for CLIP encoder but without specific version numbers for these software dependencies.
Experiment Setup Yes Experiment details For estimating the probability density function (PDF) of Heterogeneous CLIPScore (HCS) in both the training data and generated images, we use Gaussian Kernel Density Estimation (KDE). In this process, we extract 10,000 samples from generated and real images to obtain PDFs of attribute strengths. These PDFs are then used to compute Sa D and Pa D. In every experiment, we use a set of NA = 20 attributes. We generate 30K images with captions from COCO using SDv1.5 and SDv2.1 to calculate Sa D and Pa D with attributes extracted from the captions. We use NA = 30.