Privacy-Preserving In-Context Learning for Large Language Models
Authors: Tong Wu, Ashwinee Panda, Jiachen T. Wang, Prateek Mittal
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate DP-ICL on four text classification benchmarks and two language generation tasks, and our empirical results show that DP-ICL achieves a strong utility-privacy tradeoff. |
| Researcher Affiliation | Academia | Tong Wu* Ashwinee Panda*, Jiachen T. Wang*, Prateek Mittal Princeton University {tongwu,ashwinee,tianhaowang,pmittal}@princeton.edu |
| Pseudocode | Yes | Algorithm 1 Differentially Private In-Context Learning (Meta Algorithm) |
| Open Source Code | Yes | 1Our code is available at https://github.com/tongwu2020/DP-ICL |
| Open Datasets | Yes | We evaluate our DP-ICL paradigm with these approaches for private aggregation on datasets spanning text classification (SST-2, Amazon, AGNews, TREC), documentation question-answering (Doc VQA), and document summarization (SAMsum). |
| Dataset Splits | Yes | We use the training data contains 221,329 data and evaluate the performance on 100 data that are randomly selected from validation dataset. |
| Hardware Specification | No | The paper mentions using LLM APIs (e.g., GPT-3 models, Open LLa MA-13B) but does not specify the underlying hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions using specific LLM models and APIs (e.g., Open AI's text-embedding-ada-002, GPT-3, Open LLa MA-13B) but does not specify software dependencies like programming languages or libraries with version numbers. |
| Experiment Setup | Yes | We primarily focus on in-context learning with 4 exemplars (4-shot) and 10,000 queries. We set the number of exemplar-query pairs to 10 after subsampling and selected ε = {1, 3, 8} and δ = 10 4 to achieve different levels of privacy. |