HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models
Authors: Sharon Zhou, Mitchell Gordon, Ranjay Krishna, Austin Narcomey, Li F. Fei-Fei, Michael Bernstein
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test HYPE across six state-of-the-art generative adversarial networks and two sampling techniques on conditional and unconditional image generation using four datasets: Celeb A, FFHQ, CIFAR-10, and Image Net. We find that HYPE can track the relative improvements between models, and we confirm via bootstrap sampling that these measurements are consistent and replicable. |
| Researcher Affiliation | Academia | Sharon Zhou , Mitchell L. Gordon , Ranjay Krishna , Austin Narcomey , Li Fei-Fei , Michael S. Bernstein Stanford University {sharonz, mgord, ranjaykrishna, aon2, feifeili, msb}@cs.stanford.edu |
| Pseudocode | No | The paper describes methods in text and uses diagrams, but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | We deploy HYPE at https://hype.stanford.edu, where researchers can upload a model and retrieve a HYPE score. This describes a deployed service, not necessarily open-source code for the methodology. |
| Open Datasets | Yes | Datasets. We evaluate on four datasets. (1) Celeb A-64 [37]... (2) FFHQ-1024 [26]... (3) CIFAR-10 [31]... (4) Image Net-5... from the Image Net dataset [13] |
| Dataset Splits | No | The paper describes the datasets used to train the GANs and the sampling strategy for the human evaluation (50 real and 50 fake images), but does not specify explicit train/validation/test splits for these datasets for model training or for the evaluation itself. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used to run the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers needed to replicate the experiments. |
| Experiment Setup | Yes | Image exposures are in the range [100ms, 1000ms], derived from the perception literature [17]. All blocks begin at 500ms and last for 150 images (50% generated, 50% real)... Exposure times are raised at 10ms increments and reduced at 30ms decrements, following the 3-up/1-down adaptive staircase approach... |