reproducibilityindex.ai

Does Writing with Language Models Reduce Content Diversity?

Authors: Vishakh Padmakumar, He He

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we measure the impact of co-writing on diversity via a controlled experiment, where users write argumentative essays in three setups
Researcher Affiliation	Academia	Vishakh Padmakumar New York University vishakh@nyu.edu He He New York University hehe@cs.nyu.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and data are available at https://github.com/vishakhpk/hai-diversity.
Open Datasets	Yes	For the sake of reproducibility, the essays with character-level logs as recorded by the interface along with all model suggestions presented to users will be released after the review period. [...] Code and data are available at https://github.com/vishakhpk/hai-diversity.
Dataset Splits	No	The paper describes a controlled experimental setup with different groups of participants and topics, stating 'In total, we obtained 10 essays on each of the 10 topics for each setting, resulting in 300 essays in total.' However, it does not specify a training, validation, or test split for a dataset in the context of model development or evaluation reproducibility.
Hardware Specification	No	The paper mentions using the 'Open AI API' (davinci, text-davinci-003, gpt-3.5-turbo) to obtain model continuations and summaries. However, it does not specify any hardware details (e.g., GPU models, CPU types, or server specifications) used by the authors for running their experiments or data processing.
Software Dependencies	No	The paper mentions software like 'Scikit-learn' and models such as 'davinci', 'text-davinci-003', 'gpt-3.5-turbo', and 'GPT2'. However, it does not provide specific version numbers for any of these software components, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	Specifically, we sample continuations from both models with a temperature of 0.9 and a frequency penalty of 0.5 (detailed parameters listed in Appendix A.2) Temperature: 0.9 Frequency penalty: 0.5 Presence penalty: 0.5