An Interpretable Evaluation of Entropy-based Novelty of Generative Models
Authors: Jingwei Zhang, Cheuk Ting Li, Farzan Farnia
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We support the KEN framework by presenting numerical results on synthetic and real image datasets, indicating the framework s effectiveness in detecting novel modes and comparing generative models. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, The Chinese University of Hong Kong, 2Department of Information Engineering, The Chinese University of Hong Kong. |
| Pseudocode | Yes | Algorithm 1 Computation of KEN & novel mode centers |
| Open Source Code | Yes | The paper s code is available at: github.com/buyeah1109/KEN. |
| Open Datasets | Yes | We performed experiments on the following image datasets: 1) CIFAR-10 (Krizhevsky et al., 2009) with 60k images of 10 classes, 2) Image Net-1K (Deng et al., 2009) with 1.4 million images of 1000 classes, containing 20k dog images from 120 different dog breeds, 3) Celeb A (Liu et al., 2015) with 200k face images of celebrities, 4) FFHQ (Karras et al., 2019) with 70k human-face images, 5) AFHQ (Choi et al., 2020) with 15k animal-face images of dogs, cats, and wildlife. The AFHQ-dog subset has 5k images from 8 dog breeds. 6) Wildlife dataset (Mehta, 2023) with 2k wild animal images. |
| Dataset Splits | No | The paper mentions "m, n = 5000 sample size for the test and reference data" but does not specify explicit training/validation/test dataset splits or percentages, nor does it refer to a validation set for hyperparameter tuning in the context of reproducing the experiment. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions using specific pre-trained models and embeddings like Inception-V3, DINOv2, and CLIP, but it does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries used in the implementation. |
| Experiment Setup | Yes | In our experiments, we used m, n = 5000 sample size for the test and reference data. In the experiments, we chose parameter η = 1 for the KEN evaluation. For the synthetic experiments, we used σ = 0.5. In our experiments, we observed σ [10, 15] could satisfy this requirement for all the tested image data with the Inception-V3 embedding. |