Secure Quantized Training for Deep Learning
Authors: Marcel Keller, Ke Sun
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement training of neural networks in secure multi-party computation (MPC) using quantization commonly used in said setting. We are the first to present an MNIST classifier purely trained in MPC that comes within 0.2 percent of the accuracy of the same convolutional neural network trained via plaintext computation. More concretely, we have trained a network with two convolutional and two dense layers to 99.2% accuracy in 3.5 hours (under one hour for 99% accuracy). We have also implemented Alex Net for CIFAR-10, which converges in a few hours. We develop novel protocols for exponentiation and inverse square root. Finally, we present experiments in a range of MPC security models for up to ten parties, both with honest and dishonest majority as well as semi-honest and malicious security. |
| Researcher Affiliation | Collaboration | 1CSIRO s Data61, Sydney, Australia 2The Australian National University. |
| Pseudocode | Yes | Algorithm 1 Exponentiation with base two (Aly & Smart, 2019) |
| Open Source Code | Yes | Code available at https://github.com/data61/ MP-SPDZ. |
| Open Datasets | Yes | For a concrete measurement of accuracy and running times, we train a multi-class classifier3 for the widely-used MNIST dataset (Le Cun et al., 2010). We have also implemented Alex Net for CIFAR-10, which converges in a few hours. |
| Dataset Splits | Yes | We use SGD with learning rate 0.01, batch size 128, and the usual MNIST training/test split. Test/training split We have used the usual MNIST split. |
| Hardware Specification | Yes | We use the CPU of one AWS c5.9xlarge instance per party whereas Tan et al. use one NVIDIA Tesla V100 GPU per party. |
| Software Dependencies | No | We build our implementation on MP-SPDZ by Keller (2020). Other software like TensorFlow and Keras are mentioned without explicit version numbers. |
| Experiment Setup | Yes | We use SGD with learning rate 0.01, batch size 128, and the usual MNIST training/test split. In the following we discuss our choice of hyperparameters. Number of epochs As we found convergence after 100 epochs, we have run most of our benchmarks for 150 epochs... Mini-batch size We have used 128 throughout... Learning rate ...we settled for 0.01 for SGD and 0.001 for AMSGrad... Hyperparameters for Adam/AMSGrad We use the common choice β1 = 0.9, β2 = 0.999, and ϵ = 10-8. |