Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Authors: Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments on three benchmark datasets. Our results demonstrate that AUGPE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines.
Researcher Affiliation Collaboration 1University of Illinois Urbana-Champaign 2Microsoft Research 3Sun Yat-sen University 4University of Chicago.
Pseudocode Yes Algorithm 1 Augmented Private Evolution (AUG-PE)
Open Source Code Yes Our code and data are available at https://github.com/AI-secure/aug-pe.
Open Datasets Yes Datasets. We evaluate AUG-PE on three datasets: Yelp Review (Inc, 2023), Open Review, and Pub Med abstracts.
Dataset Splits Yes The number of train/val/test samples and label information in Tb. 10.
Hardware Specification Yes it takes 1764 GPU hours on 32G NVIDIA V100 to finetune GPT-2-Large on Yelp
Software Dependencies No The paper mentions specific models (e.g., 'sentencetransformer', 'Ro BERTa-base', 'BERTMini and BERTSmall') and their developers/citations, but it does not provide specific version numbers for general software components or libraries (e.g., Python, PyTorch).
Experiment Setup Yes We set the max sequence length as 512, the batch size as 64, the learning rate as 3e-5, and the number of epochs as 5 for Yelp and 10 for Open Review.