reproducibilityindex.ai

Privacy-Preserving Embedding via Look-up Table Evaluation with Fully Homomorphic Encryption

Authors: Jae-Yun Kim, Saerom Park, Joohee Lee, Jung Hee Cheon

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments In this section, we demonstrate the efficiency of HELUT by using the synthetic table and large-scale real-world word embeddings. ... Table 1 shows the results of evaluating LUT for 2^12 input indices with four HELUT implementations: HELUT-LT, HELUT-CI, Coded HELUT, and Coded HELUT with parallelization p1. ... Table 2 shows the amortized running times of HELUT evaluation, as well as the compressed embedding performance in terms of MSE (mean squared error between original and compressed embeddings).
Researcher Affiliation	Academia	1Department of Mathematical Sciences, Seoul National University, Seoul, South Korea 2Department of Industrial Engineering, Ulsan National Institute of Science and Technology, Ulsan, South Korea 3Department of Convergence Security Engineering, Sungshin Women s University, Seoul, South Korea.
Pseudocode	Yes	Algorithm 1 Parameter Selection of r, u for Sq Method ... Algorithm 3 HELUT with Coded Input(HELUT-CI) ... Algorithm 4 Coded HELUT Evaluation
Open Source Code	No	Our HE implementation was based on Open FHE (Al Badawi et al., 2022). For embedding compression, we used the public code of (Shu & Nakayama, 2018; Kim, 2018) with Pytorch 1.10.0 (Python 3) for the pre-trained embeddings.
Open Datasets	Yes	For comparison, we utilized synthetically generated table (T : Z64 R16) and the real-world large-scale NLP embedding Glo Ve 6B50d, Glo Ve 42B300d (Pennington et al., 2014a;b), BERT(Bidirectional Encoder Representations from Transformers) (Pires et al., 2019) and GPT-2(Generative pre-trained transformer) (Radford et al., 2019). ...Additionally, we conducted an experiment to demonstrate the effectiveness of the compressed embedding for the downstream sentiment analysis task on the IMDB dataset (Maas et al., 2011).
Dataset Splits	Yes	The IMDB dataset contains 25,000 reviews for both the training and validation sets.
Hardware Specification	Yes	Our experiments were conducted on the server with Intel Xeon 6426Y at 2.5GHz.
Software Dependencies	Yes	Our HE implementation was based on Open FHE (Al Badawi et al., 2022). For embedding compression, we used the public code of (Shu & Nakayama, 2018; Kim, 2018) with Pytorch 1.10.0 (Python 3) for the pre-trained embeddings.
Experiment Setup	Yes	We trained the model with the code embedding matrices and the discrete codes that are constructed by a neural network with the Gumbel-Softmax trick, and Adam optimizer with a learning rate 0.001. ... We conducted the training for 200K iterations and evaluated MSE in every 1000 iterations. ... The classifier was trained for up to 50 epochs using Glo Ve 42B300d embeddings and up to 100 epochs using Glo Ve 6B50d embeddings.