reproducibilityindex.ai

Segmenting Watermarked Texts From Language Models

Authors: Xingchi Li, Guanxun Li, Xianyang Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate our technique, we apply it to texts generated by several language models with prompts extracted from Google s C4 dataset and obtain encouraging numerical results.1
Researcher Affiliation	Academia	Xingchi Li Department of Statistics Texas A&M University College Station, TX 77843 anthony.li@stat.tamu.edu Guanxun Li Department of Statistics Beijing Normal University at Zhuhai Zhuhai, Guangdong 519087 guanxun@bnu.edu.cn Xianyang Zhang Department of Statistics Texas A&M University College Station, TX 77843 zhangxiany@stat.tamu.edu
Pseudocode	Yes	Algorithm 1 Seed BS-NOT for change point detection in potentially partially watermarked texts
Open Source Code	Yes	We release all code publicly at https://github.com/doccstat/llm-watermark-cpd.
Open Datasets	Yes	We conduct extensive real-data-based experiments following a similar empirical setting in Kirchenbauer et al. [2023a], where we generate watermarked text based on the prompts sampled from the news-like subset of the colossal clean crawled corpus (C4) dataset [Raffel et al., 2020].
Dataset Splits	No	The paper does not provide specific details on validation dataset splits, percentages, or methodology. While it mentions validating their technique generally, it does not specify a distinct 'validation set' split from the C4 dataset or other experimental data.
Hardware Specification	No	The paper mentions "Arseven Computing Cluster at the Department of Statistics, Texas A&M University" but does not specify any particular CPU, GPU models, or detailed hardware specifications.
Software Dependencies	No	The paper mentions using specific LLM models (e.g., openai-community/gpt2, facebook/opt-1.3b, Meta-Llama-3-8B) and GNU Parallel, but it does not provide specific version numbers for these or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	We fix the length of text m = 500, the size of sliding window B = 20, and the block size used in the block bootstrap-based test B = 20. ... In Algorithm 1, we set the decay parameter a = 2 and the minimum length of the intervals generated by Seed BS to be 50 such that the block bootstrapbased test is meaningful, and the threshold ζ {0.05, 0.01, 0.005, 0.001}.