reproducibilityindex.ai

Raidar: geneRative AI Detection viA Rewriting

Authors: Chengzhi Mao, Carl Vondrick, Hao Wang, Junfeng Yang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Visualizations, empirical experiments show that our simple rewriting-based algorithm Raidar significantly improves detection for several established paragraph-level detection benchmarks.
Researcher Affiliation	Collaboration	Chengzhi Mao1 & Carl Vondrick1 & Hao Wang2 & Junfeng Yang1 Columbia University1 Rutgers University2
Pseudocode	Yes	Algorithm 1 Detecting LLM Generated Content via Output Invariance
Open Source Code	Yes	Our data and code is available at https://github.com/cvlab-columbia/Raidar LLMDetect.git.
Open Datasets	Yes	Creative Writing Dataset is a language dataset based on the subreddit Writing Prompts, which is creative writing by a community based on the prompts. We use the dataset generated by Verma et al. (2023).
Dataset Splits	Yes	The training and testing domain for Table 2. For all experiments in Table 2, we use logistic regression, and use the same source and target for invariance, equivariance, and uncertainty. For News, we train on Creative Writing and test on News. For Creative Writing, we train on News and test on Creative Writing. FOr Student Essay, we train on News, and test on student Essay.
Hardware Specification	No	The paper mentions interacting with LLM APIs (e.g., GPT-3.5-Turbo) but does not specify the hardware used for running their own experiments and models.
Software Dependencies	No	The paper mentions using Logistic Regression and XGBoost for classification but does not specify the version numbers of the software libraries or programming languages used.
Experiment Setup	Yes	We use GPT-3.5-Turbo as the LLM to rewrite the input text. Once we obtain the editing distance feature from the rewriting, we use Logistic Regression (Berkson, 1944) or XGBoost (Chen & Guestrin, 2016) to perform the binary classification.