An Interpretable Evaluation of Entropy-based Novelty of Generative Models

Authors: Jingwei Zhang, Cheuk Ting Li, Farzan Farnia

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support the KEN framework by presenting numerical results on synthetic and real image datasets, indicating the framework s effectiveness in detecting novel modes and comparing generative models.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, The Chinese University of Hong Kong, 2Department of Information Engineering, The Chinese University of Hong Kong.
Pseudocode Yes Algorithm 1 Computation of KEN & novel mode centers
Open Source Code Yes The paper s code is available at: github.com/buyeah1109/KEN.
Open Datasets Yes We performed experiments on the following image datasets: 1) CIFAR-10 (Krizhevsky et al., 2009) with 60k images of 10 classes, 2) Image Net-1K (Deng et al., 2009) with 1.4 million images of 1000 classes, containing 20k dog images from 120 different dog breeds, 3) Celeb A (Liu et al., 2015) with 200k face images of celebrities, 4) FFHQ (Karras et al., 2019) with 70k human-face images, 5) AFHQ (Choi et al., 2020) with 15k animal-face images of dogs, cats, and wildlife. The AFHQ-dog subset has 5k images from 8 dog breeds. 6) Wildlife dataset (Mehta, 2023) with 2k wild animal images.
Dataset Splits No The paper mentions "m, n = 5000 sample size for the test and reference data" but does not specify explicit training/validation/test dataset splits or percentages, nor does it refer to a validation set for hyperparameter tuning in the context of reproducing the experiment.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions using specific pre-trained models and embeddings like Inception-V3, DINOv2, and CLIP, but it does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries used in the implementation.
Experiment Setup Yes In our experiments, we used m, n = 5000 sample size for the test and reference data. In the experiments, we chose parameter η = 1 for the KEN evaluation. For the synthetic experiments, we used σ = 0.5. In our experiments, we observed σ [10, 15] could satisfy this requirement for all the tested image data with the Inception-V3 embedding.