Privacy-Preserving In-Context Learning for Large Language Models

Authors: Tong Wu, Ashwinee Panda, Jiachen T. Wang, Prateek Mittal

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate DP-ICL on four text classification benchmarks and two language generation tasks, and our empirical results show that DP-ICL achieves a strong utility-privacy tradeoff.
Researcher Affiliation Academia Tong Wu* Ashwinee Panda*, Jiachen T. Wang*, Prateek Mittal Princeton University {tongwu,ashwinee,tianhaowang,pmittal}@princeton.edu
Pseudocode Yes Algorithm 1 Differentially Private In-Context Learning (Meta Algorithm)
Open Source Code Yes 1Our code is available at https://github.com/tongwu2020/DP-ICL
Open Datasets Yes We evaluate our DP-ICL paradigm with these approaches for private aggregation on datasets spanning text classification (SST-2, Amazon, AGNews, TREC), documentation question-answering (Doc VQA), and document summarization (SAMsum).
Dataset Splits Yes We use the training data contains 221,329 data and evaluate the performance on 100 data that are randomly selected from validation dataset.
Hardware Specification No The paper mentions using LLM APIs (e.g., GPT-3 models, Open LLa MA-13B) but does not specify the underlying hardware used for running the experiments.
Software Dependencies No The paper mentions using specific LLM models and APIs (e.g., Open AI's text-embedding-ada-002, GPT-3, Open LLa MA-13B) but does not specify software dependencies like programming languages or libraries with version numbers.
Experiment Setup Yes We primarily focus on in-context learning with 4 exemplars (4-shot) and 10,000 queries. We set the number of exemplar-query pairs to 10 after subsampling and selected ε = {1, 3, 8} and δ = 10 4 to achieve different levels of privacy.