Nimbus: Secure and Efficient Two-Party Inference for Transformers
Authors: Zhengyi Li, Kang Yang, Jin Tan, Wen-jie Lu, Haoqi Wu, Xiao Wang, Yu Yu, Derun Zhao, Yancheng Zheng, Minyi Guo, Jingwen Leng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of Nimbus using the popular Transformer model BERTbase under both LAN and WAN settings. Table 2 reports the accuracy of floating-point plaintext, Bumble Bee, and our approximation across 8 tasks in the GLUE benchmark[37]. |
| Researcher Affiliation | Collaboration | Zhengyi Li1, , Kang Yang3, , Jin Tan4, Wen-jie Lu4, Haoqi Wu4, Xiao Wang5, Yu Yu1,2, Derun Zhao4, Yancheng Zheng4, Minyi Guo1,2, Jingwen Leng1,2, 1Shanghai Jiao Tong University, 2Shanghai Qizhi Institute, 3State Key Laboratory of Cryptology 4Ant Group, 5Northwestern University |
| Pseudocode | Yes | Algorithm 1 Secure Matrix Multiplication Protocol of Nimbus |
| Open Source Code | Yes | The code is available at: https://github.com/secretflow/spu. |
| Open Datasets | Yes | Our method is evaluated on widely used Transformer model BERTbase [19] from Hugging Face [38]. To evaluate the accuracy of our non-linear approximation, we test it on eight datasets from widely used GLUE benchmark [37]. |
| Dataset Splits | No | The paper mentions using a 'training dataset' and evaluating on 'GLUE benchmark' datasets, but it does not specify explicit training/validation/test splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | The performances are evaluated on two nodes with 64 v CPUs and 128 GB memory. |
| Software Dependencies | No | The paper mentions using 'Secret Flow [28]' and 'Hugging Face [38]' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Except optimized non-linear functions using ring Z232 and precision s = 12, other operations follow standard Z264 and s = 18 for the secret sharing. We use N = 8192 for the HE encryption. The performances are evaluated on two nodes with 64 v CPUs and 128 GB memory. We use Linux Traffic Control (tc) to simulate LAN and WAN network settings, where the bandwidth and the ping latency are (3Gbps, 1ms) and (400Mbps, 10ms), respectively. ... When evaluating the performance, we use 128 as a mild average number of the input sequence length. |