reproducibilityindex.ai

Split-and-Denoise: Protect large language model inference with local differential privacy

Authors: Peihua Mai, Ran Yan, Zhe Huang, Youjia Yang, Yan Pang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate Sn D s effectiveness in optimizing the privacyutility tradeoff across various LLM architectures and diverse downstream tasks.
Researcher Affiliation	Academia	1National University of Singapore 2North China Electric Power University 3University of South California.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The implementation is available at https://github.com/NusIoraPrivacy/eaas-privacy.
Open Datasets	Yes	To train the denoise model, we use the combination of 20 datasets to better mimic the generalized training scenarios, including Tweet Eval Offensive (Barbieri et al., 2020), Hate Speech 18 (de Gibert et al., 2018), Health Fact (Kotonya & Toni, 2020), Daily Dialogue (Li et al., 2017), etc. See the full list of datasets we used in Appendix A.3.
Dataset Splits	No	The paper describes training and testing datasets but does not explicitly state the use of a validation dataset split or its size/percentage.
Hardware Specification	Yes	All the experiments are performed on a virtual server with Intel Xeon Platinum 8336C CPU and NVIDIA RTX A6000 GPU (CUDA version 12.2).
Software Dependencies	Yes	We utilize Python 3.9 as the programming language and pytorch 2.2.2 as the underlying framework.
Experiment Setup	Yes	The hyperparameters of denoise model are represented as followed: dmodel: Dimension of input embeddings and hidden states. dff: Hidden dimension in the feed forward network. dkv: Dimension of each head in the multi-head attention layer. nhead: Number of heads in the multi-head attention layer. L: Number of layers. Table 8 lists the hyperparameters for each denoise model.