An Unforgeable Publicly Verifiable Watermark for Large Language Models
Authors: Aiwei Liu, Leyi Pan, Xuming Hu, Shuang Li, Lijie Wen, Irwin King, Philip S. Yu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that our algorithm attains high detection accuracy and computational efficiency through neural networks. |
| Researcher Affiliation | Academia | 1Tsinghua University 2The Chinese University of Hong Kong 3The Hong Kong University of Science and Technology (Guangzhou) 4University of Illinois at Chicago |
| Pseudocode | Yes | Algorithm 1 Watermark Generation Step |
| Open Source Code | Yes | Our code is available at https://github.com/THU-BPM/unforgeable_watermark. |
| Open Datasets | Yes | We select the C4(Raffel et al., 2020) and Dbpedia Class (Raffel et al., 2020) datasets |
| Dataset Splits | No | The paper describes the data used for training the watermark generation and detection networks and for evaluating the main results (500 human, 500 generated texts), but it does not specify explicit validation splits or quantities for the overall experimental setup. |
| Hardware Specification | Yes | All timings were recorded on a single V100 32G GPU. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' but does not specify version numbers for any software dependencies like programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or other tools. |
| Experiment Setup | Yes | The default hyperparameters are configured as follows: watermark token ratio γ of 0.5, window size w of 5, five token embedding layers, and δ = 2 for the generator. The detector comprises two LSTM layers as well as the same token embedding layers. For decoding, Top-K employs K = 20 and Beam Search utilizes a beam width of 8. Both networks adopt a learning rate of 0.01, optimized via the Adam optimizer (Kingma & Ba, 2014). Due to the varying sensitivity of language models to the watermark, for ease of comparison, the detection z-score thresholds for GPT-2, OPT 1.3B, and LLa MA 7B are set to 1, 1 and 3, respectively. |