Tracing Text Provenance via Context-Aware Lexical Substitution
Authors: Xi Yang, Jie Zhang, Kejiang Chen, Weiming Zhang, Zehua Ma, Feng Wang, Nenghai Yu11613-11621
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that, under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences and has a better transferability than existing methods. Besides, the proposed LS approach outperforms the state-of-the-art approach on the Stanford Word Substitution Benchmark. |
| Researcher Affiliation | Academia | University of Science and Technology of China {yx9726@mail., zjzac@mail., chenkj@mail., zhangwm@, mzh045@mail., nishi@mail., ynh@}ustc.edu.cn |
| Pseudocode | Yes | Algorithm 1 Context-Aware Lexical Substitution; Algorithm 2 Sequence Incremental Watermark Embedding |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of their methodology. |
| Open Datasets | Yes | We choose datasets with different writing styles, namely, Novels, Wiki Text-2, IMDB, and Ag News. For Novels, we select Wuthering Heights, Dracula, and Pride and Prejudice from Project Gutenberg2. For the rest datasets, we select the first 10,000 sentences each from the Wiki Text-2, IMDB, and Ag News datasets provided by Hugging Face3. (2https://www.gutenberg.org/, 3https://huggingface.co/datasets) |
| Dataset Splits | No | The paper mentions datasets used but does not specify the exact train/validation/test splits (e.g., percentages or sample counts) needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions pre-trained models (bert-base-cased, roberta-large-mnli, stsb-roberta-base-v2) and NLTK, but does not specify version numbers for general software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | We set f = 1 by default in Algorithm 2 and K = 32 when generating candidates. |