Differentially Private and Communication Efficient Collaborative Learning
Authors: Jiahao Ding, Guannan Liang, Jinbo Bi, Miao Pan7219-7227
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed methods are evaluated in extensive experiments on real-world datasets and the empirical results validate our theoretical findings. |
| Researcher Affiliation | Academia | 1University of Houston 2University of Connecticut {jding7, mpan2}@uh.edu, {guannan.liang, jinbo.bi}@uconn.edu |
| Pseudocode | Yes | Algorithm 1 Q-DPSGD-1 run by agent i, Algorithm 2 Q-DPSGD-2 run by agent i |
| Open Source Code | No | The paper does not provide any explicit statement about releasing code or a link to a code repository. |
| Open Datasets | Yes | We conduct the experiments over two benchmark datasets: MNIST and CIFAR-10. |
| Dataset Splits | No | The paper mentions training data ("randomly sample 10,000 records for training") and implicitly test data (through performance comparison figures), but does not explicitly describe a validation set or a three-way split for train/validation/test. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using Python for implementation (implied by typical ML papers) but does not provide specific version numbers for Python, libraries, or other software dependencies. |
| Experiment Setup | Yes | In the experiments, we set the step sizes (α, ε) = (0.3/T 1/6, 11/T 1/2) for Q-DPSGD-1 and Q-DPSGD-2, and α = 0.2 for DSGD and SDM. Moreover, we also set θ = 0.6 as stated in (Zhang et al. 2020) for SDM. To control the sensitivity of the gradient, we adopt gradient clipping threshold technique, ℓ(xi,t; θ) = ℓ(xi,t; θ)/ max (1, ℓ(xi,t; θ) /K). Here, we set K = 0.5 for Q-DPSGD-1 and Q-DPSGD-2 and SDM. In each simulation, we randomly sample 10,000 records for training and divide them into n parties, and thus each party consists of 10000/n data samples (i.e., m = 10000/n). In all experiments, we set δ = 10 5. We also set the processing speed of each machine follows a uniform distribution given as V Uniform(10, 90), and then choose the deadline Td = B/E[V ], where B is the expected batch size used in each machine. We consider a low precision quantizer in (5) with various quantization levels s, and we denote Tc as the communication time of a p-dimension vector without quantization (16 bits). |