Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Survey on the Possibilities & Impossibilities of AI-generated Text Detection
Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Geiping, Furong Huang, Dinesh Manocha, Amrit Bedi
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this survey, we aim to provide a concise categorization and overview of current work encompassing both the prospects and the limitations of AI-generated text detection. ... Specifically on XSum dataset, when samples are paraphrased using a Open AI GPT-3.5-Turbo API (which is different from the paraphraser using during training RADAR), RADAR improves detection performance by 16.6% and 59.5% as compared to a Roberta-based detector fine-tuned on Web Text (Gokaslan et al., 2019) and Detect GPT (Mitchell et al., 2023). Table 1: Evaluating popular language models using state-of-art Post-hoc detectors on Xsum, SQuAD, and WP dataset. The table is motivated from Hu et al. (2023). The values are obtained by reproducing the results in (Hu et al., 2023). |
| Researcher Affiliation | Academia | Soumya Suvra Ghosal* EMAIL University of Maryland, College Park, MD, USA Souradip Chakraborty* EMAIL University of Maryland, College Park, MD, USA Jonas Geiping EMAIL University of Maryland, College Park, MD, USA Furong Huang EMAIL University of Maryland, College Park, MD, USA Dinesh Manocha EMAIL University of Maryland, College Park, MD, USA Amrit Singh Bedi EMAIL University of Maryland, College Park, MD, USA |
| Pseudocode | No | The paper describes various methods, such as the watermarking operation (Section 4.1.2), in numbered steps; however, these are presented as descriptive text rather than formal pseudocode blocks or algorithms. |
| Open Source Code | No | The paper is a survey and does not introduce a new methodology for which dedicated source code would typically be released. It does not contain any explicit statements about code availability or links to code repositories for the work described in this paper. |
| Open Datasets | No | This paper is a survey and does not conduct its own experiments requiring a dataset. While it references various datasets used in the reviewed literature (e.g., Xsum, SQuAD, Web Text), it does not provide access information for a dataset used in its own analysis or methodology. |
| Dataset Splits | No | This paper is a survey and does not present its own experimental results. Therefore, it does not provide specific dataset split information for data partitioning. |
| Hardware Specification | No | This paper is a survey and does not describe its own experimental setup or computations that would require specific hardware specifications. |
| Software Dependencies | No | This paper is a survey and does not describe its own experimental implementation or methodology that would require specific ancillary software details with version numbers. |
| Experiment Setup | No | This paper is a survey and does not describe its own experimental setup, hyperparameters, or system-level training settings. |