DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

Authors: Junyuan Hong, Jiachen T. Wang, Chenhui Zhang, Zhangheng LI, Bo Li, Zhangyang Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our method presents an outstanding performance on multiple language tasks. Prompts tuned on open-source Vicuna-7b (Chiang et al., 2023) can achieve significant performance gains across 4 tasks after transfer to closed-source heterogeneous-architecture models (GPT3.5) or open-source models (Llama-2 (Touvron et al., 2023) or Vicuna-33b). Our method presents an outstanding performance on multiple language tasks. Prompts tuned on open-source Vicuna-7b (Chiang et al., 2023) can achieve significant performance gains across 4 tasks after transfer to closed-source heterogeneous-architecture models (GPT3.5) or open-source models (Llama-2 (Touvron et al., 2023) or Vicuna-33b).
Researcher Affiliation Academia 1University of Texas at Austin, 2Princeton University, 3MIT, 4University of Chicago
Pseudocode Yes Algorithm 1 DP-OPT (ϵ0 < ) or OPT (ϵ0 = ), Algorithm 2 DP-Ens Gen: Differentially-Private Ensemble Generation, Algorithm 3 Deep Language Network (DLN-1) and potential privacy leakage, Algorithm 4 Limited Domain(h; k, k, ϵ0, δ0)
Open Source Code Yes Codes are available at https://github.com/VITA-Group/DP-OPT.
Open Datasets Yes We use SST-2 from the GLUE benchmark (Wang et al., 2018) which includes 6.7 × 10^4 samples. Trec and Mpqa (Lu et al., 2021) and Disaster (Bansal et al., 2019) are smaller datasets consisting of fewer training samples.
Dataset Splits Yes The validation set is selected from the training set per random seed. The ratio of validation with respect to the training set is included in brackets. For all trainable methods, we hold out 5% of training data for validation and report accuracy on the original test set.
Hardware Specification No Portions of this research were conducted with the advanced computing resources provided by Texas A&M High Performance Research Computing1, a composable computing cluster (He et al., 2023). This statement refers to a high-performance computing resource but does not provide specific hardware models like GPUs or CPUs.
Software Dependencies No For DPSGD, we adopt dp-transformers package to reduce the memory overhead caused by gradient clipping (Wutschitz et al., 2022) and tune the hyper-parameters for each dataset. The version number for `dp-transformers` is not specified.
Experiment Setup Yes Detailed parameters for DP-OPT are given in Table 5. As the total δ is determined by sample size, we mainly tune the ϵ0 and δ0 for each dataset such that enough tokens can be generated in DP-OPT. For DP-OPT and OPT on Mpqa, we set the repetition penalty to be 1.1 to avoid repeated words. Table 5 lists Max new tokens, Batch size, ϵ0, δ0, and temperature for each task.