MERGE: Fast Private Text Generation

Authors: Zi Liang, Pinghui Wang, Ruofei Zhang, Nuo Xu, Shuo Zhang, Lifeng Xing, Haitao Bai, Ziyang Zhou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that MERGE achieves a 26.5x speedup to the vanilla encrypted model under the sequence length 512, and reduces 80% communication cost, with an up to 10x speedup to state-of-the-art approximated models.
Researcher Affiliation Academia MOE KLINNS Lab, Xi an Jiaotong University, Xi an 710049, P. R. China {liangzid, zs412082986, xlf20200926, haitao.bai, dakandao}@stu.xjtu.edu.cn, phwang@mail.xjtu.edu.cn, rfzhang@gmail.com, nxu@sei.xjtu.edu.cn
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Source code of experiments can be found here: https://github.com/liangzid/MERGE.
Open Datasets Yes We evaluate MERGE on three representative text generation tasks, including Multiwoz (Eric et al. 2020), a human-human multi-turn task-oriented dialogue corpus, Daily Dialog (Li et al. 2017), a multi-turn chitchat dataset, and Common Gen (Lin et al. 2020), a hard-constrained controlled text generation benchmark.
Dataset Splits No The paper mentions training parameters and datasets, but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification Yes All experiments above are on a single 32 GB Nvidia Tesla V100 GPU. Following previous works (Li et al. 2022), for the experiments of private inference, we use two 32 GB Nvidia Tesla V100 GPUs to simulate the client and the server, with 10 Gb E Ethernet bandwidth.
Software Dependencies No The paper mentions 'huggingface Transformers', 'Crypten', and 'Py Torch' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We trained all models under the learning rate 3 × 10^−5, batch size 4 with 3 epochs... train MERGE with 50, 000 steps under the learning rate 8 × 10^−5. We set the dropout rate to 0.6, λ to 0.75, and noise to 0.75.