Segmenting Watermarked Texts From Language Models

Authors: Xingchi Li, Guanxun Li, Xianyang Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate our technique, we apply it to texts generated by several language models with prompts extracted from Google s C4 dataset and obtain encouraging numerical results.1
Researcher Affiliation Academia Xingchi Li Department of Statistics Texas A&M University College Station, TX 77843 anthony.li@stat.tamu.edu Guanxun Li Department of Statistics Beijing Normal University at Zhuhai Zhuhai, Guangdong 519087 guanxun@bnu.edu.cn Xianyang Zhang Department of Statistics Texas A&M University College Station, TX 77843 zhangxiany@stat.tamu.edu
Pseudocode Yes Algorithm 1 Seed BS-NOT for change point detection in potentially partially watermarked texts
Open Source Code Yes We release all code publicly at https://github.com/doccstat/llm-watermark-cpd.
Open Datasets Yes We conduct extensive real-data-based experiments following a similar empirical setting in Kirchenbauer et al. [2023a], where we generate watermarked text based on the prompts sampled from the news-like subset of the colossal clean crawled corpus (C4) dataset [Raffel et al., 2020].
Dataset Splits No The paper does not provide specific details on validation dataset splits, percentages, or methodology. While it mentions validating their *technique* generally, it does not specify a distinct 'validation set' split from the C4 dataset or other experimental data.
Hardware Specification No The paper mentions "Arseven Computing Cluster at the Department of Statistics, Texas A&M University" but does not specify any particular CPU, GPU models, or detailed hardware specifications.
Software Dependencies No The paper mentions using specific LLM models (e.g., openai-community/gpt2, facebook/opt-1.3b, Meta-Llama-3-8B) and GNU Parallel, but it does not provide specific version numbers for these or any other software libraries or dependencies used in the experiments.
Experiment Setup Yes We fix the length of text m = 500, the size of sliding window B = 20, and the block size used in the block bootstrap-based test B = 20. ... In Algorithm 1, we set the decay parameter a = 2 and the minimum length of the intervals generated by Seed BS to be 50 such that the block bootstrapbased test is meaningful, and the threshold ΞΆ {0.05, 0.01, 0.005, 0.001}.