reproducibilityindex.ai

Optimizing Watermarks for Large Language Models

Authors: Bram Wouters

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper introduces a systematic approach to this trade-off in terms of a multi-objective optimization problem. For a large class of robust, efficient watermarks, the associated Pareto optimal solutions are identified and shown to outperform existing robust, efficient watermarks. ... Our contribution. For a large class of robust, efficient watermarks based on the green-red split of the vocabulary, we translate the test-text trade-off into a multi-objective optimization problem and identify the associated Pareto optimal solutions. We empirically validate the optimality of the solutions and show that they outperform existing proposals of robust, efficient watermarks (Kirchenbauer et al., 2023; Kuditipudi et al., 2024; Wu et al., 2023) with respect to the test-text trade-off. ... 4. Experiments
Researcher Affiliation	Academia	1University of Amsterdam. Correspondence to: Bram Wouters <b.m.wouters@uva.nl>.
Pseudocode	No	The paper contains mathematical equations and descriptions of functions but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	*Code is available at https://github.com/brwo/optimizing-watermarks.
Open Datasets	Yes	From the C4 dataset (Raffel et al., 2020) a sample of 500 (news) articles is drawn randomly.
Dataset Splits	No	The paper describes how texts are generated for evaluation purposes using pre-trained LLMs, but it does not specify training/validation/test splits for a model trained within the scope of this paper.
Hardware Specification	No	The paper mentions the use of specific LLMs (e.g., OPT-1.3B, BART-large) but does not specify the hardware (e.g., GPU, CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions the 'Huggingface library (Wolf et al., 2020)' but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	Sampling from the LLM takes place with a temperature of 1.0. In order to generate sequences of a fixed length T = 30, the EOS token is suppressed. ... For TS we use the BART-large model (Liu et al., 2020)... For MT we use the WMT 2016 dataset and use the Multilingual BART model (Liu et al., 2020)... We used the default sampling strategy: beam search with 4 beams.