reproducibilityindex.ai

Privacy-Preserving In-Context Learning for Large Language Models

Authors: Tong Wu, Ashwinee Panda, Jiachen T. Wang, Prateek Mittal

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate DP-ICL on four text classification benchmarks and two language generation tasks, and our empirical results show that DP-ICL achieves a strong utility-privacy tradeoff.
Researcher Affiliation	Academia	Tong Wu* Ashwinee Panda, Jiachen T. Wang, Prateek Mittal Princeton University {tongwu,ashwinee,tianhaowang,pmittal}@princeton.edu
Pseudocode	Yes	Algorithm 1 Differentially Private In-Context Learning (Meta Algorithm)
Open Source Code	Yes	1Our code is available at https://github.com/tongwu2020/DP-ICL
Open Datasets	Yes	We evaluate our DP-ICL paradigm with these approaches for private aggregation on datasets spanning text classification (SST-2, Amazon, AGNews, TREC), documentation question-answering (Doc VQA), and document summarization (SAMsum).
Dataset Splits	Yes	We use the training data contains 221,329 data and evaluate the performance on 100 data that are randomly selected from validation dataset.
Hardware Specification	No	The paper mentions using LLM APIs (e.g., GPT-3 models, Open LLa MA-13B) but does not specify the underlying hardware used for running the experiments.
Software Dependencies	No	The paper mentions using specific LLM models and APIs (e.g., Open AI's text-embedding-ada-002, GPT-3, Open LLa MA-13B) but does not specify software dependencies like programming languages or libraries with version numbers.
Experiment Setup	Yes	We primarily focus on in-context learning with 4 exemplars (4-shot) and 10,000 queries. We set the number of exemplar-query pairs to 10 after subsampling and selected ε = {1, 3, 8} and δ = 10 4 to achieve different levels of privacy.