Multiscale Positive-Unlabeled Detection of AI-Generated Texts
Authors: Yuchuan Tian, Hanting Chen, Xutao Wang, Zheyuan Bai, QINGHUA ZHANG, Ruifeng Li, Chao Xu, Yunhe Wang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our MPU method augments detection performance on long AI-generated texts, and significantly improves short-text detection of language model detectors. |
| Researcher Affiliation | Collaboration | Yuchuan Tian1, Hanting Chen2, Xutao Wang2, Zheyuan Bai2, Qinghua Zhang3, Ruifeng Li4, Chao Xu1, Yunhe Wang2 1 National Key Lab of General AI, School of Intelligence Science and Technology, Peking University 2 Huawei Noah s Ark Lab 3 Huawei Group Finance 4 Huawei Central Software Institute |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The codes are available at https://github.com/mindspore-lab/ mindone/tree/master/examples/detect_chatgpt and https:// github.com/Yuchuan Tian/AIGC_text_detector. |
| Open Datasets | Yes | Datasets. We choose Tweep Fake (Fagni et al., 2020) and HC3 (Guo et al., 2023) as benchmarks for our experiments. |
| Dataset Splits | Yes | Datasets. We choose Tweep Fake (Fagni et al., 2020) and HC3 (Guo et al., 2023) as benchmarks for our experiments. In the Chat GPT text detection experiments, we follow the setting of HC3 (Guo et al., 2023) to test the performance of our method. |
| Hardware Specification | Yes | We use a single Nvidia Tesla V100 as the device for experiments. We gratefully acknowledge the support of Mind Spore, CANN and Ascend AI Processor used for this research. |
| Software Dependencies | No | The paper mentions software like Mind Spore and CANN but does not provide specific version numbers for these or other key software dependencies. |
| Experiment Setup | Yes | Following the training setting of Kumarage et al. (2023), we use batchsize 16, learning rate 1e-5 for Tweep Fake; following the setting of Guo et al. (2023), we use batchsize 32, learning rate 5e-5 for HC3. Adam W optimizors are adopted. We replicate all experiments three times to avoid fluctuation, using seed=0,1,2. |