Detecting Pretraining Data from Large Language Models

Authors: Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, Luke Zettlemoyer

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that MIN-K% PROB achieves a 7.4% improvement on WIKIMIA over these previous methods. We apply MIN-K% PROB to three real-world scenarios, copyrighted book detection, contaminated downstream example detection and privacy auditing of machine unlearning, and find it a consistently effective solution.
Researcher Affiliation Academia 1University of Washington 2Princeton University
Pseudocode Yes Algorithm 1 Pretraining Data Detection
Open Source Code No The paper provides a project website URL (swj0419.github.io/detect-pretrain.github.io) which is a high-level overview page and not a direct link to a source-code repository.
Open Datasets Yes We construct our benchmark by using events added to Wikipedia after specific dates, treating them as non-member data... We used the Wikipedia API to automatically retrieve articles... Books3 subset of the Pile dataset (Gao et al., 2020; Min et al., 2023)... Red Pajama corpus (Together Compute, 2023)... Real Time Data News August 20234, containing post-2023 news absent from LLa MA pretraining... https://huggingface.co/datasets/Real Time Data/News_August_2023
Dataset Splits Yes Validation data to determine detection threshold. We construct a validation set using 50 books known to be memorized by Chat GPT... For negative examples, we collected 50 new books with first editions in 2023... From each book, we randomly extract 100 snippets of 512 words, creating a balanced validation set of 10,000 examples.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies No The paper mentions specific language models (e.g., LLa MA, GPT-Neo, Pythia) and a tool (Chat GPT2) used for tasks within the research, but it does not list specific versions of programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions) used for their own implementation.
Experiment Setup Yes The key hyperparameter of MIN-K% PROB is the percentage of tokens with the highest negative log-likelihood we select to form the top-k% set. We performed a small sweep over 10, 20, 30, 40, 50 on a held-out validation set using the LLAMA-60B model and found that k = 20 works best. We use this value for all experiments without further tuning... at a constant learning rate of 1e-4.