A Scalable Approach for Privacy-Preserving Collaborative Machine Learning
Authors: Jinhyun So, Basak Guler, Salman Avestimehr
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, we experimentally demonstrate that COPML can achieve significant speedup in training over the benchmark protocols. Our protocol provides strong statistical privacy guarantees against colluding parties (adversaries) with unbounded computational power, while achieving up to 16 speedup in the training time against the benchmark protocols. |
| Researcher Affiliation | Academia | J. So ECE Department University of Southern California (USC) jinhyuns@usc.edu B. Guler ECE Department University of California, Riverside bguler@ece.ucr.edu A.S. Avestimehr ECE Department University of Southern California (USC) avestimehr@ee.usc.edu |
| Pseudocode | Yes | The overall algorithm for COPML is presented in Appendix A.5. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We train a logistic regression model for binary image classification on the CIFAR-10 [23] and GISETTE [18] datasets, whose size is (m, d) = (9019, 3073) and (6000, 5000), respectively. |
| Dataset Splits | No | The paper specifies the number of samples for training and testing for CIFAR-10 and GISETTE datasets but does not explicitly mention a separate validation set or describe validation splits. |
| Hardware Specification | Yes | Computations are carried out on Amazon EC2 m3.xlarge machine instances. |
| Software Dependencies | No | The paper mentions "MPI4Py [11] interface on Python" but does not specify version numbers for either MPI4Py or Python, which are necessary for a reproducible description. |
| Experiment Setup | Yes | We determine T (privacy threshold) and K (amount of parallelization) in COPML as follows. Initially, we have from Theorem 1 that these parameters must satisfy N (2r + 1)(K + T 1) + 1 for our framework. Next, we have considered both r = 1 and r = 3 for the degree of the polynomial approximation of the sigmoid function and observed that the degree one approximation achieves good accuracy, as we demonstrate later. Given our choice of r = 1, we then consider two setups: Case 1: (Maximum parallelization gain) Allocate all resources to parallelization (fastest training), by letting K = N 1 3 and T = 1, Case 2: (Equal parallelization and privacy gain) Split resources almost equally between parallelization and privacy, i.e., T = N 3 6 , K = N+2 3 T. In all schemes, we apply the MPC truncation protocol from Section 3 to carry out the multiplication with η m during model updates, by choosing (k1, k2) = (21, 24) and (22, 24) for the CIFAR-10 and GISETTE datasets, respectively. |