Differentially Private Synthetic Data via Foundation Model APIs 2: Text
Authors: Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments on three benchmark datasets. Our results demonstrate that AUGPE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines. |
| Researcher Affiliation | Collaboration | 1University of Illinois Urbana-Champaign 2Microsoft Research 3Sun Yat-sen University 4University of Chicago. |
| Pseudocode | Yes | Algorithm 1 Augmented Private Evolution (AUG-PE) |
| Open Source Code | Yes | Our code and data are available at https://github.com/AI-secure/aug-pe. |
| Open Datasets | Yes | Datasets. We evaluate AUG-PE on three datasets: Yelp Review (Inc, 2023), Open Review, and Pub Med abstracts. |
| Dataset Splits | Yes | The number of train/val/test samples and label information in Tb. 10. |
| Hardware Specification | Yes | it takes 1764 GPU hours on 32G NVIDIA V100 to finetune GPT-2-Large on Yelp |
| Software Dependencies | No | The paper mentions specific models (e.g., 'sentencetransformer', 'Ro BERTa-base', 'BERTMini and BERTSmall') and their developers/citations, but it does not provide specific version numbers for general software components or libraries (e.g., Python, PyTorch). |
| Experiment Setup | Yes | We set the max sequence length as 512, the batch size as 64, the learning rate as 3e-5, and the number of epochs as 5 for Yelp and 10 for Open Review. |