SeqPATE: Differentially Private Text Generation via Knowledge Distillation

Authors: Zhiliang Tian, Yingxiu Zhao, Ziyue Huang, Yu-Xiang Wang, Nevin L. Zhang, He He

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments verify the effectiveness of Seq PATE in protecting both training samples and sensitive phrases.
Researcher Affiliation Academia 1 National University of Defense Technology, 2 The Hong Kong University of Science and Technology, 3 UC Santa Barbara, 4 New York University
Pseudocode Yes We show the training algorithm in App. B and a running example in App. K.
Open Source Code Yes Our code is publicly accessible. 7. (See details about hyperparameters in App. G) [Footnote 7: https://github.com/tianzhiliang/Seq PATE]
Open Datasets Yes We evaluate our model on two datasets. Air Dialog [47] consists of 1M utterances from customer service dialog on flight booking; Europarl_v6 consists of 2M English sentences collected from European Parliament.5 (See details about datasets in App. E.) [Footnote 5: www.statmt.org/europarl] [Reference 47: Wei Wei, Quoc Le, Andrew Dai, and Jia Li. Airdialogue: An environment for goal-oriented dialogue research. In EMNLP, pages 3844 3854, 2018.]
Dataset Splits Yes The coefficient λ that balances supervision for the teacher and the pseudo-data (Eq. 4) is set to 20, where we have tuned it on the validation set of the public pseudo-data.
Hardware Specification No The paper mentions running Seq PATE “on a single GPU” but does not specify the make, model, or any other details about the GPU or other hardware components (e.g., CPU, memory).
Software Dependencies No The paper mentions fine-tuning models from the “pre-trained GPT-2 model [35]” and using “Adam [23]” optimizer. However, it does not provide specific version numbers for GPT-2, Adam, or any underlying deep learning frameworks (e.g., PyTorch, TensorFlow) or programming languages.
Experiment Setup Yes The batch size is 32 for all comparing methods except the GC [24] (GC [24] requires 2048). We use Adam [23] and adjust the initial learning rate with a range of 10 3 to 10 6 for all methods. The δ mentioned in Sec. 5 for all DP methods is 10 6. The coefficient λ... is set to 20... The default number of teacher models is 2k... The threshold p is 0.95.