Nimbus: Secure and Efficient Two-Party Inference for Transformers

Authors: Zhengyi Li, Kang Yang, Jin Tan, Wen-jie Lu, Haoqi Wu, Xiao Wang, Yu Yu, Derun Zhao, Yancheng Zheng, Minyi Guo, Jingwen Leng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of Nimbus using the popular Transformer model BERTbase under both LAN and WAN settings. Table 2 reports the accuracy of floating-point plaintext, Bumble Bee, and our approximation across 8 tasks in the GLUE benchmark[37].
Researcher Affiliation Collaboration Zhengyi Li1, , Kang Yang3, , Jin Tan4, Wen-jie Lu4, Haoqi Wu4, Xiao Wang5, Yu Yu1,2, Derun Zhao4, Yancheng Zheng4, Minyi Guo1,2, Jingwen Leng1,2, 1Shanghai Jiao Tong University, 2Shanghai Qizhi Institute, 3State Key Laboratory of Cryptology 4Ant Group, 5Northwestern University
Pseudocode Yes Algorithm 1 Secure Matrix Multiplication Protocol of Nimbus
Open Source Code Yes The code is available at: https://github.com/secretflow/spu.
Open Datasets Yes Our method is evaluated on widely used Transformer model BERTbase [19] from Hugging Face [38]. To evaluate the accuracy of our non-linear approximation, we test it on eight datasets from widely used GLUE benchmark [37].
Dataset Splits No The paper mentions using a 'training dataset' and evaluating on 'GLUE benchmark' datasets, but it does not specify explicit training/validation/test splits (e.g., percentages or sample counts).
Hardware Specification Yes The performances are evaluated on two nodes with 64 v CPUs and 128 GB memory.
Software Dependencies No The paper mentions using 'Secret Flow [28]' and 'Hugging Face [38]' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Except optimized non-linear functions using ring Z232 and precision s = 12, other operations follow standard Z264 and s = 18 for the secret sharing. We use N = 8192 for the HE encryption. The performances are evaluated on two nodes with 64 v CPUs and 128 GB memory. We use Linux Traffic Control (tc) to simulate LAN and WAN network settings, where the bandwidth and the ping latency are (3Gbps, 1ms) and (400Mbps, 10ms), respectively. ... When evaluating the performance, we use 128 as a mild average number of the input sequence length.