HETAL: Efficient Privacy-preserving Transfer Learning with Homomorphic Encryption

Authors: Seewoo Lee, Garam Lee, Jung Woo Kim, Junbum Shin, Mun-Kyu Lee

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results for five well-known benchmark datasets show total training times of 567 3442 seconds, which is less than an hour. We implemented and evaluated HETAL using five well-known benchmark datasets (MNIST (Deng, 2012), CIFAR-10 (Krizhevsky et al., 2009), Face Mask Detection (Larxel, 2020), Derma MNIST (Yang et al., 2023), and SNIPS (Coucke et al., 2018)), in addition to two pre-trained models (Vi T (Dosovitskiy et al., 2021) and MPNet (Song et al., 2020)). Our experimental results showed training times of 4.29 to 15.72 seconds per iteration and total training times of 567 to 3442 seconds (less than an hour), with almost the same classification accuracy as nonencrypted training. The accuracy loss by encrypted training was 0.5% at most.
Researcher Affiliation Collaboration 1University of California, Berkeley, US 2Crypto Lab Inc., Seoul, South Korea 3Inha University, Incheon, South Korea
Pseudocode Yes Algorithm 1 Row-wise softmax approximation; Algorithm 2 Diag ABT: Homomorphic evaluation of t AB; Algorithm 3 Rot Left( A , k); Algorithm 4 PRot Up(B, k); Algorithm 5 Diag ATB: Homomorphic evaluation of t A B
Open Source Code Yes *Our codes for the experiments are available at https://github.com/Crypto Lab Inc/HETAL.
Open Datasets Yes We used five benchmark datasets for image classification and sentiment analysis: MNIST (Deng, 2012), CIFAR-10 (Krizhevsky et al., 2009), Face Mask Detection (Larxel, 2020), Derma MNIST (Yang et al., 2023), and SNIPS (Coucke et al., 2018).
Dataset Splits Yes The client extracts features from training and validation data. Table 4 describes the number of samples in each split (train, validation, test) for each benchmark. The splits are already given for Derma MNIST and SNIPS datasets, and we randomly split original train sets into train and validation sets for the other datasets of the ratio 7:1.
Hardware Specification Yes We used an Intel Xeon Gold 6248 CPU at 2.50GHz, running with 64 threads, and a single Nvidia Ampere A40 GPU.
Software Dependencies No The paper mentions 'HEaa N (Crypto Lab)' and 'Num Py (Harris et al., 2020)' as software used but does not provide specific version numbers for these libraries or for the programming language (Python) used.
Experiment Setup Yes For early stopping, we set the patience to 3. Table 1 shows that we fine-tuned the encrypted models on all benchmark datasets within an hour. In addition, the accuracy drops of the encrypted models were at most 0.51%, compared to the unencrypted models with the same hyperparameters. (See Table 5 in the Appendix for hyperparameters and the number of epochs for early-stopping.).