Position: On the Possibilities of AI-Generated Text Detection

Authors: Souradip Chakraborty, Amrit Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our comprehensive empirical tests, conducted across various datasets (Xsum, Squad, IMDb, and Kaggle Fake News) and with several state-of-the-art text generators (GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13BChat-HF, Llama-2-70B-Chat-HF), assess the viability of enhanced detection methods against detectors like Ro BERTa-Large/Base-Detector and GPTZero, with increasing sample sizes and sequence lengths.
Researcher Affiliation Academia 1University of Maryland, College Park, MD, USA 2University of Central Florida, FL, USA. Correspondence to: Souradip Chakraborty <schakra3@umd.edu>, Amrit Singh Bedi <amritbedi@ucf.edu>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to its source code, nor does it explicitly state that its code will be made available.
Open Datasets Yes Our experimental analysis spans across 4 critical datasets, including the news articles from XSum dataset (Narayan et al., 2018), Wikipedia paragraphs from Squad dataset (Rajpurkar et al., 2016), IMDb reviews (Maas et al., 2011), and Kaggle Fake News dataset (Lifferth, 2018)
Dataset Splits No The paper mentions evaluating "test AUROC" and training models but does not provide specific percentages or sample counts for training, validation, and test splits, nor does it reference predefined splits with citations.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory, or specific cloud instance types used for running its experiments. It only mentions using 'standard ML models'.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup No The paper mentions using specific models (Logistic Regression, Random Forest, 2-layer Neural Network), feature representations (TF-IDF bag of words), and specific text generators/detectors (GPT-2, GPT-3.5-Turbo, Llama, Roberta-Large/Base-Detector, GPTZero), but it does not provide specific hyperparameters like learning rates, batch sizes, number of epochs, or detailed optimizer settings for training.