reproducibilityindex.ai

Multiscale Positive-Unlabeled Detection of AI-Generated Texts

Authors: Yuchuan Tian, Hanting Chen, Xutao Wang, Zheyuan Bai, QINGHUA ZHANG, Ruifeng Li, Chao Xu, Yunhe Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our MPU method augments detection performance on long AI-generated texts, and significantly improves short-text detection of language model detectors.
Researcher Affiliation	Collaboration	Yuchuan Tian1, Hanting Chen2, Xutao Wang2, Zheyuan Bai2, Qinghua Zhang3, Ruifeng Li4, Chao Xu1, Yunhe Wang2 1 National Key Lab of General AI, School of Intelligence Science and Technology, Peking University 2 Huawei Noah s Ark Lab 3 Huawei Group Finance 4 Huawei Central Software Institute
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The codes are available at https://github.com/mindspore-lab/ mindone/tree/master/examples/detect_chatgpt and https:// github.com/Yuchuan Tian/AIGC_text_detector.
Open Datasets	Yes	Datasets. We choose Tweep Fake (Fagni et al., 2020) and HC3 (Guo et al., 2023) as benchmarks for our experiments.
Dataset Splits	Yes	Datasets. We choose Tweep Fake (Fagni et al., 2020) and HC3 (Guo et al., 2023) as benchmarks for our experiments. In the Chat GPT text detection experiments, we follow the setting of HC3 (Guo et al., 2023) to test the performance of our method.
Hardware Specification	Yes	We use a single Nvidia Tesla V100 as the device for experiments. We gratefully acknowledge the support of Mind Spore, CANN and Ascend AI Processor used for this research.
Software Dependencies	No	The paper mentions software like Mind Spore and CANN but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup	Yes	Following the training setting of Kumarage et al. (2023), we use batchsize 16, learning rate 1e-5 for Tweep Fake; following the setting of Guo et al. (2023), we use batchsize 32, learning rate 5e-5 for HC3. Adam W optimizors are adopted. We replicate all experiments three times to avoid fluctuation, using seed=0,1,2.