DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning
Authors: Xun Guo, Yongxin He, Shan Zhang, Ting Zhang, Wanquan Feng, Haibin Huang, Chongyang Ma
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method enhances the ability of various text encoders in detecting AI-generated text across multiple benchmarks and achieves state-of-the-art results. |
| Researcher Affiliation | Collaboration | Xun Guo1 Shan Zhang2 Yongxin He2 Ting Zhang2 Wanquan Feng1 Haibin Huang1 Chongyang Ma1 1Byte Dance 2University of Chinese Academy of Sciences |
| Pseudocode | No | The paper describes the steps of the proposed method (e.g., multi-level contrastive learning, dense information retrieval pipeline) but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Our code is available at https://github.com/heyongxin233/De Te Ctive |
| Open Datasets | Yes | In this study, we employ three widely-used and challenging datasets to evaluate our proposed method. The Deepfake [39] dataset includes text generated by 27 different LLMs and human-written content from multiple websites across 10 domains, encompassing 332K training and 57K test data. It also outlines six diverse testing scenarios, covering an array of settings from domain-specific to cross-domains, and out-of-distribution detection scenarios. The M4 [68] dataset is a multi-domain, multi-model, and multi-language dataset encompassing data from 8 LLMs, 6 domains, and 9 languages. [...] Finally, we make use of the Turing Bench [61] dataset. |
| Dataset Splits | Yes | The Deepfake [39] dataset includes text generated by 27 different LLMs and human-written content from multiple websites across 10 domains, encompassing 332K training and 57K test data. |
| Hardware Specification | Yes | We train for 50 epochs with batch size of 32 per GPU on 8 NVIDIA V100 GPUs. |
| Software Dependencies | No | For all our method s experiments, we use the interfaces and pre-trained model weights from the Hugging Face transformers [28] library. [...] During inference, we implement with an efficient K-Nearest Neighbors (KNN) [15] algorithm provided by the Faiss [46] library, to perform classification. |
| Experiment Setup | Yes | All experiments use the same hyperparameters and an Adam W [44] optimizer with a cosine annealing learning rate schedule. The peak learning rate is set at 2e-05, warmed up linearly for 2000 steps, and weight decay is set to 1e-04. The maximum input token length is 512. We train for 50 epochs with batch size of 32 per GPU on 8 NVIDIA V100 GPUs. |