reproducibilityindex.ai

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Authors: Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct comprehensive experiments on three benchmark datasets. Our results demonstrate that AUGPE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines.
Researcher Affiliation	Collaboration	1University of Illinois Urbana-Champaign 2Microsoft Research 3Sun Yat-sen University 4University of Chicago.
Pseudocode	Yes	Algorithm 1 Augmented Private Evolution (AUG-PE)
Open Source Code	Yes	Our code and data are available at https://github.com/AI-secure/aug-pe.
Open Datasets	Yes	Datasets. We evaluate AUG-PE on three datasets: Yelp Review (Inc, 2023), Open Review, and Pub Med abstracts.
Dataset Splits	Yes	The number of train/val/test samples and label information in Tb. 10.
Hardware Specification	Yes	it takes 1764 GPU hours on 32G NVIDIA V100 to finetune GPT-2-Large on Yelp
Software Dependencies	No	The paper mentions specific models (e.g., 'sentencetransformer', 'Ro BERTa-base', 'BERTMini and BERTSmall') and their developers/citations, but it does not provide specific version numbers for general software components or libraries (e.g., Python, PyTorch).
Experiment Setup	Yes	We set the max sequence length as 512, the batch size as 64, the learning rate as 3e-5, and the number of epochs as 5 for Yelp and 10 for Open Review.