Large Scale Private Learning via Low-rank Reparametrization

Authors: Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on various kinds of tasks to demonstrate the effectiveness of RGP. We first examine the utility of models trained by RGP. To this end, we apply RGP on the wide Res Net (Zagoruyko & Komodakis, 2016) and the BERT (Devlin et al., 2018) models, which are representative models for computer vision and natural language modeling. The results are presented in Section 5.1 and 5.2.
Researcher Affiliation Collaboration Da Yu 1 2 Huishuai Zhang 2 Wei Chen 2 Jian Yin 1 Tie-Yan Liu 2 1The School of Data and Computer Science & Guangdong Key Laboratory of Big Data Analysis and Processing, Sun Yatsen University, Guangdong, China. The work was done when D. Yu was an intern at Microsoft Research Asia. 2Microsoft Research Asia, Beijing, China.
Pseudocode Yes The pseudocode of RGP is presented in Algorithm 1. The RGP proceeds for all the layers and we ignore the layer index for simplicity in the following discussion. Algorithm 2 Decomposition via Power Method.
Open Source Code Yes The source code of our implementation is publicly available1. Footnote 1: https://github.com/dayu11/Differentially-Private-Deep-Learning
Open Datasets Yes The dataset for the BERT model is SST-2 from the GLUE benchmark (Wang et al., 2018a). The dataset for the wide Res Net model is CIFAR-10 (Krizhevsky & Hinton, 2009). We use two vision datasets: SVHN (Netzer et al., 2011) and CIFAR10 (Krizhevsky & Hinton, 2009).
Dataset Splits Yes We use two vision datasets: SVHN (Netzer et al., 2011) and CIFAR10 (Krizhevsky & Hinton, 2009). We use four tasks from the General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2018a), including MNLI, QQP, QNLI, and SST-2.
Hardware Specification Yes All experiments are run on a node with four Tesla V100 GPUs.
Software Dependencies No The paper mentions 'We use an open-source tool of moments accountant to compute the privacy loss2.' with footnote 2 linking to 'https://github.com/tensorflow/privacy'. It also references 'Opacus (Opacus, 2020). URL https://github.com/pytorch/opacus.' but does not specify exact version numbers for these or other software dependencies.
Experiment Setup Yes Implementation. The number of iterations for power method is 1. We use an open-source tool of moments accountant to compute the privacy loss2. For a given setting of hyperparameters, we set σ to be the smallest value so that the privacy budget is allowable to run desired epochs. Hyperparameters. We follow the hyperparameters in Zagoruyko & Komodakis (2016) except using a mini-batch size 1000. This mini-batch size is larger than the default because the averaging effect of large mini-batch reduces the noise variance. The reparametrization rank r is chosen from {1, 2, 4, 8, 16}. We choose the privacy parameter δ < 1 n, and set δ = 10 6 for SVHN and δ = 10 5 for CIFAR10.