reproducibilityindex.ai

Recovering Private Text in Federated Learning of Language Models

Authors: Samyak Gupta, Yangsibo Huang, Zexuan Zhong, Tianyu Gao, Kai Li, Danqi Chen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Experiments Model and datasets. We evaluate the proposed attack with the GPT-2 base (117M parameters) model (Radford et al., 2019) on two language modeling datasets, including Wiki Text-103 (Merity et al., 2017) and the Enron Email dataset (Klimt & Yang, 2004). Both datasets are publicly available for research uses. and Evaluation metrics. We use the following metrics to evaluate the attack performance: (a) ROUGE (Lin, 2004)... (b) We also propose to use named entity recovery ratio (NERR)...
Researcher Affiliation	Academia	Samyak Gupta Princeton University samyakg@cs.princeton.edu Yangsibo Huang Princeton University yangsibo@princeton.edu Zexuan Zhong Princeton University zzhong@cs.princeton.edu Tianyu Gao Princeton University tianyug@cs.princeton.edu Kai Li Princeton University li@cs.princeton.edu Danqi Chen Princeton University danqic@cs.princeton.edu
Pseudocode	Yes	We provide a detailed algorithm in Appendix A.
Open Source Code	Yes	Our code is publicly available at https://github.com/Princeton-SysML/FILM.
Open Datasets	Yes	We evaluate the proposed attack with the GPT-2 base (117M parameters) model (Radford et al., 2019) on two language modeling datasets, including Wiki Text-103 (Merity et al., 2017) and the Enron Email dataset (Klimt & Yang, 2004). Both datasets are publicly available for research uses.
Dataset Splits	No	The paper states "All models were trained using early stopping, i.e., models were trained until the loss of the model on the evaluation set increased." which implies an evaluation/validation set, but it does not specify any explicit train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification	Yes	We note that the running time of our algorithm is quite fast, and we can recover a single sentence in under a minute using an Nvidia 2080TI GPU.
Software Dependencies	No	The paper mentions using the "GPT-2 model" and implies programming for it but does not specify any software dependencies with version numbers (e.g., "PyTorch 1.9", "Python 3.8").
Experiment Setup	Yes	Unless otherwise noted, we train the model on these sentences for 90, 000 iterations using an initial learning rate of 1 10 5, with a linearly decayed learning rate scheduler. All models were trained using early stopping, i.e., models were trained until the loss of the model on the evaluation set increased. and Our experiments demonstrate high-ﬁdelity recovery of a single sentence feasible, and recovery of signiﬁcant portions of sentences for training batches of up to 128 sentences. and We analyze the attack performance with different batch sizes, the number of training data points, and the number of training epochs.