ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations
Authors: Kailas Vodrahalli, James Zou
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. Using this setup, we collected data on 51,026 interactions from 2,250 players across 191 unique target images. |
| Researcher Affiliation | Academia | 1Stanford University. Correspondence to: Kailas Vodrahalli <kailasv@stanford.edu>, James Zou <jamesz@stanford.edu>. |
| Pseudocode | Yes | Algorithm 1 Image Steerability Estimation |
| Open Source Code | Yes | Our dataset and associated code is made available at https://github.com/kailas-v/ArtWhisperer. |
| Open Datasets | Yes | Our dataset and associated code is made available at https://github.com/kailas-v/ArtWhisperer. Our contributions We release a public dataset on human interactions with an AI model. |
| Dataset Splits | No | The paper uses "Art Whisperer-Validation" as a separate dataset for robustness assessment of findings, and mentions a "train" split for fine-tuning a synthetic prompter in Appendix A.15. However, it does not explicitly provide the full "training/validation/test" dataset splits needed to reproduce its primary experiments or any models it directly trains, beyond a train/test split for a proof-of-concept synthetic prompter. |
| Hardware Specification | No | The paper mentions the generative model used ("SD v2.1") and its settings, but does not provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running the experiments or the game server. |
| Software Dependencies | No | The paper mentions various models and methods used (e.g., "SD v2.1", "DPM Multi-step Scheduler", "MT0-large model", "IA3 method") and libraries/embeddings (e.g., "CLIP text embedding"), but it does not provide specific version numbers for the software dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | For the generative model, we use SD v2.1 (Rombach et al., 2022b) with the DPM Multi-step Scheduler (Lu et al., 2022) and run the model for 20 iterations. AI-generated target images use the same parameters but run for 50 iterations. All images are generated at size 512 × 512. For synthetic prompter in A.15: training for 30 epochs and using a linearly decaying learning rate starting from 10 3 with the Adam W optimizer (Loshchilov & Hutter, 2017). |