MERGE: Fast Private Text Generation
Authors: Zi Liang, Pinghui Wang, Ruofei Zhang, Nuo Xu, Shuo Zhang, Lifeng Xing, Haitao Bai, Ziyang Zhou
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that MERGE achieves a 26.5x speedup to the vanilla encrypted model under the sequence length 512, and reduces 80% communication cost, with an up to 10x speedup to state-of-the-art approximated models. |
| Researcher Affiliation | Academia | MOE KLINNS Lab, Xi an Jiaotong University, Xi an 710049, P. R. China {liangzid, zs412082986, xlf20200926, haitao.bai, dakandao}@stu.xjtu.edu.cn, phwang@mail.xjtu.edu.cn, rfzhang@gmail.com, nxu@sei.xjtu.edu.cn |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code of experiments can be found here: https://github.com/liangzid/MERGE. |
| Open Datasets | Yes | We evaluate MERGE on three representative text generation tasks, including Multiwoz (Eric et al. 2020), a human-human multi-turn task-oriented dialogue corpus, Daily Dialog (Li et al. 2017), a multi-turn chitchat dataset, and Common Gen (Lin et al. 2020), a hard-constrained controlled text generation benchmark. |
| Dataset Splits | No | The paper mentions training parameters and datasets, but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | All experiments above are on a single 32 GB Nvidia Tesla V100 GPU. Following previous works (Li et al. 2022), for the experiments of private inference, we use two 32 GB Nvidia Tesla V100 GPUs to simulate the client and the server, with 10 Gb E Ethernet bandwidth. |
| Software Dependencies | No | The paper mentions 'huggingface Transformers', 'Crypten', and 'Py Torch' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We trained all models under the learning rate 3 × 10^−5, batch size 4 with 3 epochs... train MERGE with 50, 000 steps under the learning rate 8 × 10^−5. We set the dropout rate to 0.6, λ to 0.75, and noise to 0.75. |