Large Scale Private Learning via Low-rank Reparametrization
Authors: Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on various kinds of tasks to demonstrate the effectiveness of RGP. We first examine the utility of models trained by RGP. To this end, we apply RGP on the wide Res Net (Zagoruyko & Komodakis, 2016) and the BERT (Devlin et al., 2018) models, which are representative models for computer vision and natural language modeling. The results are presented in Section 5.1 and 5.2. |
| Researcher Affiliation | Collaboration | Da Yu 1 2 Huishuai Zhang 2 Wei Chen 2 Jian Yin 1 Tie-Yan Liu 2 1The School of Data and Computer Science & Guangdong Key Laboratory of Big Data Analysis and Processing, Sun Yatsen University, Guangdong, China. The work was done when D. Yu was an intern at Microsoft Research Asia. 2Microsoft Research Asia, Beijing, China. |
| Pseudocode | Yes | The pseudocode of RGP is presented in Algorithm 1. The RGP proceeds for all the layers and we ignore the layer index for simplicity in the following discussion. Algorithm 2 Decomposition via Power Method. |
| Open Source Code | Yes | The source code of our implementation is publicly available1. Footnote 1: https://github.com/dayu11/Differentially-Private-Deep-Learning |
| Open Datasets | Yes | The dataset for the BERT model is SST-2 from the GLUE benchmark (Wang et al., 2018a). The dataset for the wide Res Net model is CIFAR-10 (Krizhevsky & Hinton, 2009). We use two vision datasets: SVHN (Netzer et al., 2011) and CIFAR10 (Krizhevsky & Hinton, 2009). |
| Dataset Splits | Yes | We use two vision datasets: SVHN (Netzer et al., 2011) and CIFAR10 (Krizhevsky & Hinton, 2009). We use four tasks from the General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2018a), including MNLI, QQP, QNLI, and SST-2. |
| Hardware Specification | Yes | All experiments are run on a node with four Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions 'We use an open-source tool of moments accountant to compute the privacy loss2.' with footnote 2 linking to 'https://github.com/tensorflow/privacy'. It also references 'Opacus (Opacus, 2020). URL https://github.com/pytorch/opacus.' but does not specify exact version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Implementation. The number of iterations for power method is 1. We use an open-source tool of moments accountant to compute the privacy loss2. For a given setting of hyperparameters, we set σ to be the smallest value so that the privacy budget is allowable to run desired epochs. Hyperparameters. We follow the hyperparameters in Zagoruyko & Komodakis (2016) except using a mini-batch size 1000. This mini-batch size is larger than the default because the averaging effect of large mini-batch reduces the noise variance. The reparametrization rank r is chosen from {1, 2, 4, 8, 16}. We choose the privacy parameter δ < 1 n, and set δ = 10 6 for SVHN and δ = 10 5 for CIFAR10. |