reproducibilityindex.ai

DE-COP: Detecting Copyrighted Content in Language Models Training Data

Authors: André Vicente Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that DE-COP surpasses the prior best method by 9.6% in detection performance (AUC) on models with logits available. Moreover, DE-COP also achieves an average accuracy of 72% for detecting suspect books on fully black-box models where prior methods give approximately 4% accuracy.
Researcher Affiliation	Academia	1INESC-ID / Instituto Superior T ecnico, ULisboa 2University of California, Santa Barbara 3Carnegie Mellon University.
Pseudocode	Yes	Algorithm 1 DE-COP Logit Calibration Algorithm Calculating Label Adjustment
Open Source Code	Yes	The code and datasets are available at https: //github.com/Lei Li Lab/DE-COP.
Open Datasets	Yes	We create two new benchmarks: Book Tection and ar Xiv Tection. ... The code and datasets are available at https: //github.com/Lei Li Lab/DE-COP.
Dataset Splits	No	The paper defines 'Suspect' and 'Clean' groups within its custom benchmarks (Book Tection and arXiv Tection) for evaluation purposes, but it does not specify explicit train/validation/test splits for training any model discussed within the paper.
Hardware Specification	Yes	In this study, we also used a computing cluster equipped with four NVIDIA A100 80GB GPUs, which enabled us to run all open-source models efficiently, eliminating the need for model quantization.
Software Dependencies	No	The paper lists the names of LLMs used (Mistral, Mixtral, LLaMA-2, GPT-3, ChatGPT, Claude) but does not provide specific version numbers for these or any other ancillary software libraries or frameworks.
Experiment Setup	Yes	When generating paraphrases, our model requires a certain level of creativity to produce three different examples for each query. Therefore, we set the temperature=0.1 to achieve this. In contrast, when using models for evaluation, we aim for maximum determinism, thus we set the temperature=0. The prompts we use in the models for evaluating on the Book Tection benchmark can be found in Appendix C. ... We oversample each example by creating every possible combination in a 4-option multiple-choice question format, resulting in 24 permutations.