Privacy-Preserving Embedding via Look-up Table Evaluation with Fully Homomorphic Encryption
Authors: Jae-Yun Kim, Saerom Park, Joohee Lee, Jung Hee Cheon
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments In this section, we demonstrate the efficiency of HELUT by using the synthetic table and large-scale real-world word embeddings. ... Table 1 shows the results of evaluating LUT for 2^12 input indices with four HELUT implementations: HELUT-LT, HELUT-CI, Coded HELUT, and Coded HELUT with parallelization p1. ... Table 2 shows the amortized running times of HELUT evaluation, as well as the compressed embedding performance in terms of MSE (mean squared error between original and compressed embeddings). |
| Researcher Affiliation | Academia | 1Department of Mathematical Sciences, Seoul National University, Seoul, South Korea 2Department of Industrial Engineering, Ulsan National Institute of Science and Technology, Ulsan, South Korea 3Department of Convergence Security Engineering, Sungshin Women s University, Seoul, South Korea. |
| Pseudocode | Yes | Algorithm 1 Parameter Selection of r, u for Sq Method ... Algorithm 3 HELUT with Coded Input(HELUT-CI) ... Algorithm 4 Coded HELUT Evaluation |
| Open Source Code | No | Our HE implementation was based on Open FHE (Al Badawi et al., 2022). For embedding compression, we used the public code of (Shu & Nakayama, 2018; Kim, 2018) with Pytorch 1.10.0 (Python 3) for the pre-trained embeddings. |
| Open Datasets | Yes | For comparison, we utilized synthetically generated table (T : Z64 R16) and the real-world large-scale NLP embedding Glo Ve 6B50d, Glo Ve 42B300d (Pennington et al., 2014a;b), BERT(Bidirectional Encoder Representations from Transformers) (Pires et al., 2019) and GPT-2(Generative pre-trained transformer) (Radford et al., 2019). ...Additionally, we conducted an experiment to demonstrate the effectiveness of the compressed embedding for the downstream sentiment analysis task on the IMDB dataset (Maas et al., 2011). |
| Dataset Splits | Yes | The IMDB dataset contains 25,000 reviews for both the training and validation sets. |
| Hardware Specification | Yes | Our experiments were conducted on the server with Intel Xeon 6426Y at 2.5GHz. |
| Software Dependencies | Yes | Our HE implementation was based on Open FHE (Al Badawi et al., 2022). For embedding compression, we used the public code of (Shu & Nakayama, 2018; Kim, 2018) with Pytorch 1.10.0 (Python 3) for the pre-trained embeddings. |
| Experiment Setup | Yes | We trained the model with the code embedding matrices and the discrete codes that are constructed by a neural network with the Gumbel-Softmax trick, and Adam optimizer with a learning rate 0.001. ... We conducted the training for 200K iterations and evaluated MSE in every 1000 iterations. ... The classifier was trained for up to 50 epochs using Glo Ve 42B300d embeddings and up to 100 epochs using Glo Ve 6B50d embeddings. |