reproducibilityindex.ai

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

Authors: Kailas Vodrahalli, James Zou

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. Using this setup, we collected data on 51,026 interactions from 2,250 players across 191 unique target images.
Researcher Affiliation	Academia	1Stanford University. Correspondence to: Kailas Vodrahalli <kailasv@stanford.edu>, James Zou <jamesz@stanford.edu>.
Pseudocode	Yes	Algorithm 1 Image Steerability Estimation
Open Source Code	Yes	Our dataset and associated code is made available at https://github.com/kailas-v/ArtWhisperer.
Open Datasets	Yes	Our dataset and associated code is made available at https://github.com/kailas-v/ArtWhisperer. Our contributions We release a public dataset on human interactions with an AI model.
Dataset Splits	No	The paper uses "Art Whisperer-Validation" as a separate dataset for robustness assessment of findings, and mentions a "train" split for fine-tuning a synthetic prompter in Appendix A.15. However, it does not explicitly provide the full "training/validation/test" dataset splits needed to reproduce its primary experiments or any models it directly trains, beyond a train/test split for a proof-of-concept synthetic prompter.
Hardware Specification	No	The paper mentions the generative model used ("SD v2.1") and its settings, but does not provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running the experiments or the game server.
Software Dependencies	No	The paper mentions various models and methods used (e.g., "SD v2.1", "DPM Multi-step Scheduler", "MT0-large model", "IA3 method") and libraries/embeddings (e.g., "CLIP text embedding"), but it does not provide specific version numbers for the software dependencies required to reproduce the experiments.
Experiment Setup	Yes	For the generative model, we use SD v2.1 (Rombach et al., 2022b) with the DPM Multi-step Scheduler (Lu et al., 2022) and run the model for 20 iterations. AI-generated target images use the same parameters but run for 50 iterations. All images are generated at size 512 × 512. For synthetic prompter in A.15: training for 30 epochs and using a linearly decaying learning rate starting from 10 3 with the Adam W optimizer (Loshchilov & Hutter, 2017).