reproducibilityindex.ai

DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning

Authors: Xun Guo, Yongxin He, Shan Zhang, Ting Zhang, Wanquan Feng, Haibin Huang, Chongyang Ma

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our method enhances the ability of various text encoders in detecting AI-generated text across multiple benchmarks and achieves state-of-the-art results.
Researcher Affiliation	Collaboration	Xun Guo1 Shan Zhang2 Yongxin He2 Ting Zhang2 Wanquan Feng1 Haibin Huang1 Chongyang Ma1 1Byte Dance 2University of Chinese Academy of Sciences
Pseudocode	No	The paper describes the steps of the proposed method (e.g., multi-level contrastive learning, dense information retrieval pipeline) but does not present them in a structured pseudocode or algorithm block.
Open Source Code	Yes	Our code is available at https://github.com/heyongxin233/De Te Ctive
Open Datasets	Yes	In this study, we employ three widely-used and challenging datasets to evaluate our proposed method. The Deepfake [39] dataset includes text generated by 27 different LLMs and human-written content from multiple websites across 10 domains, encompassing 332K training and 57K test data. It also outlines six diverse testing scenarios, covering an array of settings from domain-specific to cross-domains, and out-of-distribution detection scenarios. The M4 [68] dataset is a multi-domain, multi-model, and multi-language dataset encompassing data from 8 LLMs, 6 domains, and 9 languages. [...] Finally, we make use of the Turing Bench [61] dataset.
Dataset Splits	Yes	The Deepfake [39] dataset includes text generated by 27 different LLMs and human-written content from multiple websites across 10 domains, encompassing 332K training and 57K test data.
Hardware Specification	Yes	We train for 50 epochs with batch size of 32 per GPU on 8 NVIDIA V100 GPUs.
Software Dependencies	No	For all our method s experiments, we use the interfaces and pre-trained model weights from the Hugging Face transformers [28] library. [...] During inference, we implement with an efficient K-Nearest Neighbors (KNN) [15] algorithm provided by the Faiss [46] library, to perform classification.
Experiment Setup	Yes	All experiments use the same hyperparameters and an Adam W [44] optimizer with a cosine annealing learning rate schedule. The peak learning rate is set at 2e-05, warmed up linearly for 2000 steps, and weight decay is set to 1e-04. The maximum input token length is 512. We train for 50 epochs with batch size of 32 per GPU on 8 NVIDIA V100 GPUs.