DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

Authors: Xianjun Yang, Wei Cheng, Yue Wu, Linda Ruth Petzold, William Yang Wang, Haifeng Chen

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive experiments on the most advanced LLMs from Open AI, including text-davinci-003, GPT-3.5-turbo, and GPT-4, as well as open-source models such as GPT-Neo X-20B and LLa Ma-13B. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text on four English and one German dataset, outperforming Open AI s own classifier, which is trained on millions of text.
Researcher Affiliation Collaboration 1University of California, Santa Barbara 2NEC Laboratories America 3University of California, Los Angeles
Pseudocode No The paper does not contain structured pseudocode or an algorithm block.
Open Source Code Yes The source code is available at https://github.com/Xianjun-Yang/DNA-GPT.
Open Datasets Yes One is the Reddit long-form question-answer dataset from the ELI5 community (Fan et al., 2019). We also acquired scientific abstracts published on the Nature website on April 23, 2023... Additionally, we use Pub Med QA (Jin et al., 2019), Xsum (Narayan et al., 2018), and the English and German splits of WMT16 (Bojar et al., 2016) following (Mitchell et al., 2023).
Dataset Splits No The paper does not explicitly state training, validation, or test dataset splits with percentages, absolute counts, or references to predefined splits for their experiments. It mentions using 150-200 instances from each dataset but does not specify how these instances were partitioned for evaluation.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using Open AI API and T5-3B but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiment.
Experiment Setup Yes Unless explicitly stated, we employ a temperature of 0.7 to strike a balance between text diversity and quality for all five models, as has been done in previous research (Krishna et al., 2023). All other parameters remain at their default values, with the exception of a maximum token length of 300. ... The truncation ratio γ was systematically varied across values of {0.02, 0.1, 0.3, 0.5, 0.7, 0.9, 0.98}. ... In practice, we set f(n)=n log(n), n0=4, N=25 and find it works well across all datasets and models.