reproducibilityindex.ai

Bag of Tricks for Training Data Extraction from Language Models

Authors: Weichen Yu, Tianyu Pang, Qian Liu, Chao Du, Bingyi Kang, Yan Huang, Min Lin, Shuicheng Yan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results show that several previously overlooked tricks can be crucial to the success of training data extraction. Based on the GPT-Neo 1.3B evaluation results, our proposed tricks outperform the baseline by a large margin in most cases, providing a much stronger baseline for future research.
Researcher Affiliation	Collaboration	1Institute of Automation, Chinese Academy of Sciences. 2Sea AI Lab. Correspondence to: Tianyu Pang <tianyupang@sea.com>, Qian Liu <liuqian@sea.com>, Yan Huang <yhuang@nlpr.ia.ac.cn>.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/weichen-yu/LM-Extraction.
Open Datasets	Yes	The dataset used in this study is a subset of 20,000 examples from the Pile s training dataset (Gao et al., 2020).
Dataset Splits	Yes	For the purposes of this study, we divide the dataset into a training set of 19,000 samples and a testing set of 1,000 samples.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies	Yes	We employ the GPT-Neo 1.3B model implemented on Hugging Face Transformers (Wolf et al., 2020).
Experiment Setup	Yes	Table 6 contains the detailed search parameter settings. We also provide the baseline parameters as initial values to the search algorithm to speed up convergence. The number of search rounds is limited to 1,000. The experimental results are shown in Table 5. Simply using the best parameters outlined in Section. 5.1 with k=5, η=0.6, ϕ=0.6, T=0.4, r=1 yields a precision of 48.8%. Implementing auto-tuning results in a 37% improvement over baseline, and auto-tuning performs slightly better than the hypermeter manual section.