Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
Authors: Xuanli He, Qiongkai Xu, Yi Zeng, Lingjuan Lyu, Fangzhao Wu, Jiwei Li, Ruoxi Jia
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we observe that high-order conditions lead to an exponential growth of suspicious (unused) watermarks, making our crafted watermarks more stealthy. In addition, CATER can effectively identify IP infringement under architectural mismatch and cross-domain imitation attacks, with negligible impairments on the generation quality of victim APIs. We envision our work as a milestone for stealthily protecting the IP of text generation APIs. ... 4 Experiments Text Generation Tasks. We examine two widespread text generation tasks: machine translation and document summarization, which have been successfully deployed as commercial APIs.67. To demonstrate the generality of CATER, we also apply it to two more text generation tasks: i) text simplification and ii) paraphrase generation. |
| Researcher Affiliation | Collaboration | Xuanli He University College London EMAIL Qiongkai Xu University of Melbourne EMAIL Yi Zeng Virginia Tech EMAIL Lingjuan Lyu Sony AI EMAIL Fangzhao Wu Microsoft Research Asia EMAIL Jiwei Li Shannon.AI, Zhejiang University EMAIL Ruoxi Jia Virginia Tech EMAIL |
| Pseudocode | No | The paper presents mathematical formulations and optimization problems but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and data are available at: https://github.com/xlhex/cater_neurips.git |
| Open Datasets | Yes | Machine Translation: We consider WMT14 German (De) English (En) translation [2] as the testbed. We follow the official split: train (4.5M) / dev (3,000) / test (3,003). ... Document summarization: CNN/DM [14] utilizes informative headlines as summaries of news articles. We reuse the dataset preprocessed by See et al. [38] with a partition of train/dev/test as 287K / 13K / 11K. |
| Dataset Splits | Yes | Machine Translation: We consider WMT14 German (De) English (En) translation [2] as the testbed. We follow the official split: train (4.5M) / dev (3,000) / test (3,003). ... Document summarization: CNN/DM [14] utilizes informative headlines as summaries of news articles. We reuse the dataset preprocessed by See et al. [38] with a partition of train/dev/test as 287K / 13K / 11K. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments, such as GPU models, CPU types, or cloud computing instance details. It only mentions 'Transformer-base' models. |
| Software Dependencies | No | The paper mentions software like Gurobi, Moses, BART, and Transformer-base models, but it does not specify concrete version numbers for any of these, which is required for reproducibility of ancillary software. |
| Experiment Setup | Yes | We use 32K and 16K BPE vocabulary [39] for experiments on WMT14 and CNN/DM, respectively. ... We set the size of synonyms to 2 and vary this value in Appendix F.1. The detailed construction of watermarks and approximation of p in Equation 1 for CATER is provided in Appendix D. ... The training details are summarized in Appendix D. |