DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text
Authors: Xianjun Yang, Wei Cheng, Yue Wu, Linda Ruth Petzold, William Yang Wang, Haifeng Chen
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted extensive experiments on the most advanced LLMs from Open AI, including text-davinci-003, GPT-3.5-turbo, and GPT-4, as well as open-source models such as GPT-Neo X-20B and LLa Ma-13B. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text on four English and one German dataset, outperforming Open AI s own classifier, which is trained on millions of text. |
| Researcher Affiliation | Collaboration | 1University of California, Santa Barbara 2NEC Laboratories America 3University of California, Los Angeles |
| Pseudocode | No | The paper does not contain structured pseudocode or an algorithm block. |
| Open Source Code | Yes | The source code is available at https://github.com/Xianjun-Yang/DNA-GPT. |
| Open Datasets | Yes | One is the Reddit long-form question-answer dataset from the ELI5 community (Fan et al., 2019). We also acquired scientific abstracts published on the Nature website on April 23, 2023... Additionally, we use Pub Med QA (Jin et al., 2019), Xsum (Narayan et al., 2018), and the English and German splits of WMT16 (Bojar et al., 2016) following (Mitchell et al., 2023). |
| Dataset Splits | No | The paper does not explicitly state training, validation, or test dataset splits with percentages, absolute counts, or references to predefined splits for their experiments. It mentions using 150-200 instances from each dataset but does not specify how these instances were partitioned for evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Open AI API and T5-3B but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | Unless explicitly stated, we employ a temperature of 0.7 to strike a balance between text diversity and quality for all five models, as has been done in previous research (Krishna et al., 2023). All other parameters remain at their default values, with the exception of a maximum token length of 300. ... The truncation ratio γ was systematically varied across values of {0.02, 0.1, 0.3, 0.5, 0.7, 0.9, 0.98}. ... In practice, we set f(n)=n log(n), n0=4, N=25 and find it works well across all datasets and models. |