Does Writing with Language Models Reduce Content Diversity?

Authors: Vishakh Padmakumar, He He

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we measure the impact of co-writing on diversity via a controlled experiment, where users write argumentative essays in three setups
Researcher Affiliation Academia Vishakh Padmakumar New York University vishakh@nyu.edu He He New York University hehe@cs.nyu.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code and data are available at https://github.com/vishakhpk/hai-diversity.
Open Datasets Yes For the sake of reproducibility, the essays with character-level logs as recorded by the interface along with all model suggestions presented to users will be released after the review period. [...] Code and data are available at https://github.com/vishakhpk/hai-diversity.
Dataset Splits No The paper describes a controlled experimental setup with different groups of participants and topics, stating 'In total, we obtained 10 essays on each of the 10 topics for each setting, resulting in 300 essays in total.' However, it does not specify a training, validation, or test split for a dataset in the context of model development or evaluation reproducibility.
Hardware Specification No The paper mentions using the 'Open AI API' (davinci, text-davinci-003, gpt-3.5-turbo) to obtain model continuations and summaries. However, it does not specify any hardware details (e.g., GPU models, CPU types, or server specifications) used by the authors for running their experiments or data processing.
Software Dependencies No The paper mentions software like 'Scikit-learn' and models such as 'davinci', 'text-davinci-003', 'gpt-3.5-turbo', and 'GPT2'. However, it does not provide specific version numbers for any of these software components, which is required for a reproducible description of ancillary software.
Experiment Setup Yes Specifically, we sample continuations from both models with a temperature of 0.9 and a frequency penalty of 0.5 (detailed parameters listed in Appendix A.2) Temperature: 0.9 Frequency penalty: 0.5 Presence penalty: 0.5