Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Can Watermarked LLMs be Identified by Users via Crafted Prompts?
Authors: Aiwei Liu, Sheng Guan, Yiming Liu, Leyi Pan, Yifei Zhang, Liancheng Fang, Lijie Wen, Philip Yu, Xuming Hu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that almost all mainstream watermarking algorithms are easily identified with our well-designed prompts, while Water-Probe demonstrates a minimal false positive rate for non-watermarked LLMs. In our experiments, we demonstrate that the Water-Probe algorithm achieves high accuracy in detecting various types of watermarked LLMs. 4 EXPERIMENT ON WATERMARKED LLM IDENTIFICATION |
| Researcher Affiliation | Academia | 1 Tsinghua University 2 Beijing University of Posts and Telecommunications 3 The Chinese University of Hong Kong 4 University of Illinois at Chicago 5 Hongkong University of Science and Technology (Guangzhou) |
| Pseudocode | Yes | We provide the detailed steps of the Water-Probe algorithm in Algorithm 1 in the appendix. |
| Open Source Code | Yes | [Official]:https://github.com/THU-BPM/Watermarked_LLM_Identification |
| Open Datasets | Yes | For watermarked text detection, we used OPT-2.7B to generate texts on the C4 dataset (Raffel et al., 2020) |
| Dataset Splits | No | No explicit training/test/validation dataset splits are provided for the Water-Probe algorithm's evaluation or for the main LLM identification task. The C4 dataset is mentioned for generating texts in a separate watermarked text detection context, not for defining splits for the primary experimental setup. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running the experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions using the 'Mark LLM (Pan et al., 2024) framework' but does not specify its version number or other software dependencies with their versions. |
| Experiment Setup | Yes | For all LLMs, the sampling temperature was set to 1, with the number of samples set to 104. ... We set ยต = 0.1 for our experiments. |