ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

Authors: Kailas Vodrahalli, James Zou

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. Using this setup, we collected data on 51,026 interactions from 2,250 players across 191 unique target images.
Researcher Affiliation Academia 1Stanford University. Correspondence to: Kailas Vodrahalli <kailasv@stanford.edu>, James Zou <jamesz@stanford.edu>.
Pseudocode Yes Algorithm 1 Image Steerability Estimation
Open Source Code Yes Our dataset and associated code is made available at https://github.com/kailas-v/ArtWhisperer.
Open Datasets Yes Our dataset and associated code is made available at https://github.com/kailas-v/ArtWhisperer. Our contributions We release a public dataset on human interactions with an AI model.
Dataset Splits No The paper uses "Art Whisperer-Validation" as a separate dataset for robustness assessment of findings, and mentions a "train" split for fine-tuning a synthetic prompter in Appendix A.15. However, it does not explicitly provide the full "training/validation/test" dataset splits needed to reproduce its primary experiments or any models it directly trains, beyond a train/test split for a proof-of-concept synthetic prompter.
Hardware Specification No The paper mentions the generative model used ("SD v2.1") and its settings, but does not provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running the experiments or the game server.
Software Dependencies No The paper mentions various models and methods used (e.g., "SD v2.1", "DPM Multi-step Scheduler", "MT0-large model", "IA3 method") and libraries/embeddings (e.g., "CLIP text embedding"), but it does not provide specific version numbers for the software dependencies required to reproduce the experiments.
Experiment Setup Yes For the generative model, we use SD v2.1 (Rombach et al., 2022b) with the DPM Multi-step Scheduler (Lu et al., 2022) and run the model for 20 iterations. AI-generated target images use the same parameters but run for 50 iterations. All images are generated at size 512 × 512. For synthetic prompter in A.15: training for 30 epochs and using a linearly decaying learning rate starting from 10 3 with the Adam W optimizer (Loshchilov & Hutter, 2017).