Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RADAR: Robust AI-Text Detection via Adversarial Learning
Authors: Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluated with 8 different LLMs (Pythia, Dolly 2.0, Palmyra, Camel, GPT-J, Dolly 1.0, LLa MA, and Vicuna) across 4 datasets, experimental results show that RADAR significantly outperforms existing AI-text detection methods, especially when paraphrasing is in place. |
| Researcher Affiliation | Collaboration | Xiaomeng Hu The Chinese University of Hong Kong Sha Tin, Hong Kong EMAIL Pin-Yu Chen IBM Research New York, USA EMAIL Tsung-Yi Ho The Chinese University of Hong Kong Sha Tin, Hong Kong EMAIL |
| Pseudocode | Yes | Algorithm 1 RADAR: Robust AI-Text Detection via Adversarial Learning |
| Open Source Code | No | Project Page and Demos: https://radar.vizhub.ai IBM demo is developed by Hendrik Strobelt and Benjamin Hoover at IBM Research Hugging Face demo is developed by Xiaomeng Hu |
| Open Datasets | Yes | For training, we sampled 160K documents from Web Text [9] to build the human-text corpus H. ... [9] Aaron Gokaslan, Vanya Cohen, Ellie Pavlick, and Stefanie Tellex. Openwebtext corpus. http://Skylion007.github.io/Open Web Text Corpus, 2019. |
| Dataset Splits | Yes | During training, we use the test set of Web Text as the validation dataset to estimate RADAR s performance. ... Table A1: Summary of the used human-text corpora Phase Source Dataset Dataset Key Sample Counts ... Validation Web Text-test text 4007 |
| Hardware Specification | Yes | Experiments were run on 2 GPUS (NVIDIA Tesla V100 32GB). |
| Software Dependencies | No | No specific version numbers for key software components (e.g., Python, PyTorch, TensorFlow, or specific library versions) were found. |
| Experiment Setup | Yes | During training, we set the batch size to 32 and train the models until the validation loss converges. We use Adam W as the optimizer with the initial learning rate set to 1e-5 and use linear decay for both GĻ and DĻ. We set Ī» = 0.5 for sample balancing in Eq. 3 and set γ = 0.01 in Eq. 2. |