Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature
Authors: Guangsheng Bao, Yanbin Zhao, Zhiyang Teng, Linyi Yang, Yue Zhang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluations on various datasets, source models, and test conditions indicate that Fast-Detect GPT not only surpasses Detect GPT by a relative around 75% in both the white-box and black-box settings but also accelerates the detection process by a factor of 340, as detailed in Table 1. and 3 EXPERIMENTS |
| Researcher Affiliation | Academia | Guangsheng Bao Zhejiang University School of Engineering, Westlake University baoguangsheng@westlake.edu.cn Yanbin Zhao School of Mathematics, Physics and Statistics, Shanghai Polytechnic University zhaoyb553@nenu.edu.cn Zhiyang Teng Nanyang Technological University zhiyang.teng@ntu.edu.sg Linyi Yang, Yue Zhang School of Engineering, Westlake University Institute of Advanced Technology, Westlake Institute for Advanced Study {yanglinyi,zhangyue}@westlake.edu.cn |
| Pseudocode | Yes | Algorithm 1 Fast-Detect GPT machine-generated text detection. |
| Open Source Code | Yes | 1The code and data are released at https://github.com/baoguangsheng/fast-detect-gpt. |
| Open Datasets | Yes | Datasets. We follow Detect GPT using six datasets to cover various domains and languages, including XSum for news articles (Narayan et al., 2018), SQu AD for Wikipedia contexts (Rajpurkar et al., 2016), Writing Prompts for story writing (Fan et al., 2018), WMT16 English and German for different languages (Bojar et al., 2016), and Pub Med QA for biomedical research question answering (Jin et al., 2019). |
| Dataset Splits | No | The paper mentions generating samples and evaluating them, but does not specify a separate 'validation' dataset split with percentages or counts alongside training and testing splits for reproducibility. |
| Hardware Specification | Yes | Speedup assessments were conducted using the XSum news dataset, with computations on a Tesla A100 GPU. and These models are arranged in order of their parameter count, with those having fewer than 20 billion parameters being run locally on a Tesla A100 GPU (80G). |
| Software Dependencies | No | The paper mentions using 'Py Torch' and 'Open AI API' but does not specify exact version numbers for these or other software dependencies. |
| Experiment Setup | Yes | In practice, we can simply generate 10,000 samples (our default setting)... and To encourage the production of content that is both unpredictable and creatively diverse, we utilize a temperature setting of 0.8. and We evaluate these strategies using the five models and three datasets mentioned in Table 2 by setting k = 40, p = 0.96, and T = 0.8 for all cases. |