reproducibilityindex.ai

A Scalable Approach for Privacy-Preserving Collaborative Machine Learning

Authors: Jinhyun So, Basak Guler, Salman Avestimehr

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Furthermore, we experimentally demonstrate that COPML can achieve signiﬁcant speedup in training over the benchmark protocols. Our protocol provides strong statistical privacy guarantees against colluding parties (adversaries) with unbounded computational power, while achieving up to 16 speedup in the training time against the benchmark protocols.
Researcher Affiliation	Academia	J. So ECE Department University of Southern California (USC) jinhyuns@usc.edu B. Guler ECE Department University of California, Riverside bguler@ece.ucr.edu A.S. Avestimehr ECE Department University of Southern California (USC) avestimehr@ee.usc.edu
Pseudocode	Yes	The overall algorithm for COPML is presented in Appendix A.5.
Open Source Code	No	The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We train a logistic regression model for binary image classiﬁcation on the CIFAR-10 [23] and GISETTE [18] datasets, whose size is (m, d) = (9019, 3073) and (6000, 5000), respectively.
Dataset Splits	No	The paper specifies the number of samples for training and testing for CIFAR-10 and GISETTE datasets but does not explicitly mention a separate validation set or describe validation splits.
Hardware Specification	Yes	Computations are carried out on Amazon EC2 m3.xlarge machine instances.
Software Dependencies	No	The paper mentions "MPI4Py [11] interface on Python" but does not specify version numbers for either MPI4Py or Python, which are necessary for a reproducible description.
Experiment Setup	Yes	We determine T (privacy threshold) and K (amount of parallelization) in COPML as follows. Initially, we have from Theorem 1 that these parameters must satisfy N (2r + 1)(K + T 1) + 1 for our framework. Next, we have considered both r = 1 and r = 3 for the degree of the polynomial approximation of the sigmoid function and observed that the degree one approximation achieves good accuracy, as we demonstrate later. Given our choice of r = 1, we then consider two setups: Case 1: (Maximum parallelization gain) Allocate all resources to parallelization (fastest training), by letting K = N 1 3 and T = 1, Case 2: (Equal parallelization and privacy gain) Split resources almost equally between parallelization and privacy, i.e., T = N 3 6 , K = N+2 3 T. In all schemes, we apply the MPC truncation protocol from Section 3 to carry out the multiplication with η m during model updates, by choosing (k1, k2) = (21, 24) and (22, 24) for the CIFAR-10 and GISETTE datasets, respectively.