Bounding Training Data Reconstruction in Private (Deep) Learning
Authors: Chuan Guo, Brian Karrer, Kamalika Chaudhuri, Laurens van der Maaten
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our MSE lower bounds in Theorem 1 and Theorem 2 for unbiased estimators and show that RDP and FIL both provide meaningful semantic guarantees against DRAs. In addition, we evaluate the informed adversary attack (Balle et al., 2022) against privately trained models and show that a sample s vulnerability to this reconstruction attack is closely captured by the FIL lower bound. Code to reproduce our results is available at https://github.com/facebookresearch/ bounding_data_reconstruction.We first consider linear logistic regression for binary MNIST (Le Cun et al., 1998) classification of digits 0 vs. 1.The plot shows that the RDP bound is 0.1, while all the per-sample d FIL bounds are > 1. Finally, we compare MSE lower bounds for RDP and FIL accounting for the private SGD learner. We train two distinct convolutional networks3 on the full 10-digit MNIST (Le Cun et al., 1998) dataset and the CIFAR-10 (Krizhevsky et al., 2009) dataset. |
| Researcher Affiliation | Industry | 1Meta AI 2Meta. Correspondence to: Chuan Guo <chuanguo@fb.com>. |
| Pseudocode | Yes | Algorithm 1 FIL computation for private SGD. |
| Open Source Code | Yes | Code to reproduce our results is available at https://github.com/facebookresearch/ bounding_data_reconstruction. |
| Open Datasets | Yes | We first consider linear logistic regression for binary MNIST (Le Cun et al., 1998) classification of digits 0 vs. 1. The training set contains n = 12,665 samples.Finally, we compare MSE lower bounds for RDP and FIL accounting for the private SGD learner. We train two distinct convolutional networks3 on the full 10-digit MNIST (Le Cun et al., 1998) dataset and the CIFAR-10 (Krizhevsky et al., 2009) dataset. |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide details about validation dataset splits (e.g., percentages or counts for training, validation, and test sets) or the methodology for creating them. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for its experiments. It mentions software libraries like PyTorch and JAX, but no specific GPU, CPU, or other hardware details. |
| Software Dependencies | No | The paper mentions "Py Torch (Paszke et al., 2019; Horace He, 2021) and JAX (Bradbury et al., 2018)", but does not provide specific version numbers for these software dependencies, nor for any other key software components. |
| Experiment Setup | Yes | Private SGD has several hyperparameters, and we exhaustively test all setting combinations to produce the scatter plots in Figure 5. Table 3 and Table 4 give the choice of values that we considered for each hyperparameter. |