Ditto: Quantization-aware Secure Inference of Transformers upon MPC

Authors: Haoqi Wu, Wenjing Fang, Yancheng Zheng, Junming Ma, Jin Tan, Lei Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on Bert and GPT2 models to evaluate the performance of Ditto. The results demonstrate that Ditto is about 3.14 4.40 faster than MPCFormer (ICLR 2023) and 1.44 2.35 faster than the state-of-the-art work PUMA with negligible utility degradation.
Researcher Affiliation Industry 1Ant Group, Hangzhou, China. Correspondence to: Haoqi Wu <haoqi.whq@antgroup.com>
Pseudocode Yes Algorithm 1 Secure Up Cast Protocol... Algorithm 2 Approximated Ge LU Protocol... Algorithm 3 Approximated Softmax Protocol... Algorithm 4 Secure Down Cast Protocol
Open Source Code Yes The code is available at: https: //github.com/secretflow/spu.
Open Datasets Yes We use the pre-trained Bert models and GPT models in Hugging Face (Wolf et al., 2020). For Bert, we use Bert-base and Bert-large pre-trained over Book Corpus (Zhu et al., 2015) and English Wikipedia (Wikipedia contributors, 2004) datasets. For GPT, we use GPT2-base and GPT2-medium pre-trained over the Wikitext-103 dataset (Merity et al., 2016).
Dataset Splits No We evaluate Bert over RTE, Co LA, QQP and QNLI from GLUE benchmarks (Wang et al., 2019), and GPT2 on the validation set of Wikitext-103. (While "validation set" is mentioned, the explicit split sizes or percentages for reproducibility are not provided in the main text.)
Hardware Specification Yes We conduct the experiments on one Cent OS 8 machine equipped with one AMD Ryzen CPU (32 cores and 3.60GHz) and 256GB of RAM.
Software Dependencies No We implement Ditto upon the framework Secret Flow-SPU that supports privacy-preserving machine learning. (No specific version numbers are provided for SPU or other software dependencies.)
Experiment Setup Yes Experimental setup. We implement Ditto upon the framework Secret Flow-SPU... We conduct the experiments on one Cent OS 8 machine equipped with one AMD Ryzen CPU (32 cores and 3.60GHz) and 256GB of RAM. We consider two network environments: 1) LAN setting with a bandwidth of 5Gbps and 0.4ms round-trip time; 2) WAN setting with a bandwidth of 400Mbps and 40ms round-trip time. We simulate the network environments using the Linux tc tool. For Bert models, the input sequence length is set to 128... As for GPT2 models, we generate 1 new token with an input length of 32... Regarding the fine-tuning of Bert models... we use a batch size of 32 for Bert-base and 16 for Bert-large. All the inputs are of sequence length 128. We train the models for 3 epochs... We run a grid search with learning rate in [2e-5, 3e-5, 4e-5, 5e-5].