DE-COP: Detecting Copyrighted Content in Language Models Training Data
Authors: André Vicente Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that DE-COP surpasses the prior best method by 9.6% in detection performance (AUC) on models with logits available. Moreover, DE-COP also achieves an average accuracy of 72% for detecting suspect books on fully black-box models where prior methods give approximately 4% accuracy. |
| Researcher Affiliation | Academia | 1INESC-ID / Instituto Superior T ecnico, ULisboa 2University of California, Santa Barbara 3Carnegie Mellon University. |
| Pseudocode | Yes | Algorithm 1 DE-COP Logit Calibration Algorithm Calculating Label Adjustment |
| Open Source Code | Yes | The code and datasets are available at https: //github.com/Lei Li Lab/DE-COP. |
| Open Datasets | Yes | We create two new benchmarks: Book Tection and ar Xiv Tection. ... The code and datasets are available at https: //github.com/Lei Li Lab/DE-COP. |
| Dataset Splits | No | The paper defines 'Suspect' and 'Clean' groups within its custom benchmarks (Book Tection and arXiv Tection) for evaluation purposes, but it does not specify explicit train/validation/test splits for training any model discussed within the paper. |
| Hardware Specification | Yes | In this study, we also used a computing cluster equipped with four NVIDIA A100 80GB GPUs, which enabled us to run all open-source models efficiently, eliminating the need for model quantization. |
| Software Dependencies | No | The paper lists the names of LLMs used (Mistral, Mixtral, LLaMA-2, GPT-3, ChatGPT, Claude) but does not provide specific version numbers for these or any other ancillary software libraries or frameworks. |
| Experiment Setup | Yes | When generating paraphrases, our model requires a certain level of creativity to produce three different examples for each query. Therefore, we set the temperature=0.1 to achieve this. In contrast, when using models for evaluation, we aim for maximum determinism, thus we set the temperature=0. The prompts we use in the models for evaluating on the Book Tection benchmark can be found in Appendix C. ... We oversample each example by creating every possible combination in a 4-option multiple-choice question format, resulting in 24 permutations. |