Few-Shot Detection of Machine-Generated Text using Style Representations
Authors: Rafael Alberto Rivera Soto, Kailin Koch, Aleem Khan, Barry Y. Chen, Marcus Bishop, Nicholas Andrews
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach outperforms prominent few-shot learning methods as well as standard zero-shot baselines and differs significantly from prior work in that we do not require access to the predictive distribution of the unseen LLM, like Mitchell et al. (2023), or a large number of samples from it, like Zellers et al. (2019), to effectively detect text generated by these models. We also explore factors leading to effective style representations for this task, finding that contrastive training on large amounts of human-authored text is sufficient to obtain useful representations, but that in certain few-shot settings, training on additional LLM-generated documents significantly improves performance. 4 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Rafael Rivera Soto1,3, , Kailin Koch1, Aleem Khan3, Barry Chen1, Marcus Bishop2, Nicholas Andrews3, 1Lawrence Livermore National Laboratory 2U.S. Department of Defense 3Johns Hopkins University |
| Pseudocode | No | The paper describes methods textually but does not include any explicit pseudocode blocks or algorithms. |
| Open Source Code | Yes | The code and data to reproduce our experiments are available at https://github.com/LLNL/LUAR/tree/main/fewshot iclr2024. |
| Open Datasets | Yes | The data used to fine-tune the UAR style representations was sampled from a publicly available corpus of Reddit comments (Baumgartner et al., 2020). Additionally, we used Amazon reviews and Stack Exchange discussions in some model variations (Ni et al., 2019), both obtained from existing datasets. The Amazon dataset may be downloaded from https://nijianmo.github.io/amazon/index.html and the Stack Exchange dataset is available from https://pan.webis.de/clef21/pan21-web/style-change-detection.html. |
| Dataset Splits | Yes | We balanced each dataset with an equal number of human-generated examples before splitting into training, validation and testing splits. Table 4b: Numbers of documents in datasets used to train and evaluate authorship models. Origin Train Valid Test Machine 440,721 62,935 125,987 Human 440,721 62,935 125,987 Total 881,442 125,870 251,974 |
| Hardware Specification | Yes | We trained the style representations using one 8 x A100-80Gb GPU server, which took under 24 hours for each of the proposed model variations. |
| Software Dependencies | No | The paper mentions software like PyTorch, spaCy, learn2learn, but does not provide specific version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We generate completions using a variety of parameters described in Table 4a of Appendix C. Parameter Values Models GPT2-large, GPT2-xl, OPT-6.7B, OPT-13B Decoding Strategies top-p, typical-p Decoding Values 0.7, 0.95 Temperature Values 0.7, 0.9 Generation Length 512 tokens |