reproducibilityindex.ai

DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

Authors: Junyuan Hong, Jiachen T. Wang, Chenhui Zhang, Zhangheng LI, Bo Li, Zhangyang Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our method presents an outstanding performance on multiple language tasks. Prompts tuned on open-source Vicuna-7b (Chiang et al., 2023) can achieve significant performance gains across 4 tasks after transfer to closed-source heterogeneous-architecture models (GPT3.5) or open-source models (Llama-2 (Touvron et al., 2023) or Vicuna-33b). Our method presents an outstanding performance on multiple language tasks. Prompts tuned on open-source Vicuna-7b (Chiang et al., 2023) can achieve significant performance gains across 4 tasks after transfer to closed-source heterogeneous-architecture models (GPT3.5) or open-source models (Llama-2 (Touvron et al., 2023) or Vicuna-33b).
Researcher Affiliation	Academia	1University of Texas at Austin, 2Princeton University, 3MIT, 4University of Chicago
Pseudocode	Yes	Algorithm 1 DP-OPT (ϵ0 < ) or OPT (ϵ0 = ), Algorithm 2 DP-Ens Gen: Differentially-Private Ensemble Generation, Algorithm 3 Deep Language Network (DLN-1) and potential privacy leakage, Algorithm 4 Limited Domain(h; k, k, ϵ0, δ0)
Open Source Code	Yes	Codes are available at https://github.com/VITA-Group/DP-OPT.
Open Datasets	Yes	We use SST-2 from the GLUE benchmark (Wang et al., 2018) which includes 6.7 × 10^4 samples. Trec and Mpqa (Lu et al., 2021) and Disaster (Bansal et al., 2019) are smaller datasets consisting of fewer training samples.
Dataset Splits	Yes	The validation set is selected from the training set per random seed. The ratio of validation with respect to the training set is included in brackets. For all trainable methods, we hold out 5% of training data for validation and report accuracy on the original test set.
Hardware Specification	No	Portions of this research were conducted with the advanced computing resources provided by Texas A&M High Performance Research Computing1, a composable computing cluster (He et al., 2023). This statement refers to a high-performance computing resource but does not provide specific hardware models like GPUs or CPUs.
Software Dependencies	No	For DPSGD, we adopt dp-transformers package to reduce the memory overhead caused by gradient clipping (Wutschitz et al., 2022) and tune the hyper-parameters for each dataset. The version number for `dp-transformers` is not specified.
Experiment Setup	Yes	Detailed parameters for DP-OPT are given in Table 5. As the total δ is determined by sample size, we mainly tune the ϵ0 and δ0 for each dataset such that enough tokens can be generated in DP-OPT. For DP-OPT and OPT on Mpqa, we set the repetition penalty to be 1.1 to avoid repeated words. Table 5 lists Max new tokens, Batch size, ϵ0, δ0, and temperature for each task.