Bounding Training Data Reconstruction in Private (Deep) Learning

Authors: Chuan Guo, Brian Karrer, Kamalika Chaudhuri, Laurens van der Maaten

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our MSE lower bounds in Theorem 1 and Theorem 2 for unbiased estimators and show that RDP and FIL both provide meaningful semantic guarantees against DRAs. In addition, we evaluate the informed adversary attack (Balle et al., 2022) against privately trained models and show that a sample s vulnerability to this reconstruction attack is closely captured by the FIL lower bound. Code to reproduce our results is available at https://github.com/facebookresearch/ bounding_data_reconstruction.We first consider linear logistic regression for binary MNIST (Le Cun et al., 1998) classification of digits 0 vs. 1.The plot shows that the RDP bound is 0.1, while all the per-sample d FIL bounds are > 1. Finally, we compare MSE lower bounds for RDP and FIL accounting for the private SGD learner. We train two distinct convolutional networks3 on the full 10-digit MNIST (Le Cun et al., 1998) dataset and the CIFAR-10 (Krizhevsky et al., 2009) dataset.
Researcher Affiliation Industry 1Meta AI 2Meta. Correspondence to: Chuan Guo <chuanguo@fb.com>.
Pseudocode Yes Algorithm 1 FIL computation for private SGD.
Open Source Code Yes Code to reproduce our results is available at https://github.com/facebookresearch/ bounding_data_reconstruction.
Open Datasets Yes We first consider linear logistic regression for binary MNIST (Le Cun et al., 1998) classification of digits 0 vs. 1. The training set contains n = 12,665 samples.Finally, we compare MSE lower bounds for RDP and FIL accounting for the private SGD learner. We train two distinct convolutional networks3 on the full 10-digit MNIST (Le Cun et al., 1998) dataset and the CIFAR-10 (Krizhevsky et al., 2009) dataset.
Dataset Splits No The paper mentions training and testing but does not explicitly provide details about validation dataset splits (e.g., percentages or counts for training, validation, and test sets) or the methodology for creating them.
Hardware Specification No The paper does not explicitly describe the hardware used for its experiments. It mentions software libraries like PyTorch and JAX, but no specific GPU, CPU, or other hardware details.
Software Dependencies No The paper mentions "Py Torch (Paszke et al., 2019; Horace He, 2021) and JAX (Bradbury et al., 2018)", but does not provide specific version numbers for these software dependencies, nor for any other key software components.
Experiment Setup Yes Private SGD has several hyperparameters, and we exhaustively test all setting combinations to produce the scatter plots in Figure 5. Table 3 and Table 4 give the choice of values that we considered for each hyperparameter.