Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Authors: Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, Kunal Talwar

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments use MNIST and the extended SVHN datasets. Our MNIST model stacks two convolutional layers with max-pooling and one fully connected layer with Re LUs. When trained on the entire dataset, the non-private model has a 99.18% test accuracy.
Researcher Affiliation Collaboration Nicolas Papernot Pennsylvania State University ngp5056@cse.psu.edu; Mart ın Abadi Google Brain abadi@google.com; Ulfar Erlingsson Google ulfar@google.com; Ian Goodfellow Google Brain goodfellow@google.com; Kunal Talwar Google Brain kunal@google.com
Pseudocode No The paper describes the proposed method in prose and provides figures, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The source code for reproducing the results in this section is available on Git Hub.2 [Footnote 2: https://github.com/tensorflow/models/tree/master/differential_privacy/multiple_teachers]
Open Datasets Yes Our experiments use MNIST and the extended SVHN datasets. [...] Our MNIST model stacks two convolutional layers with max-pooling and one fully connected layer with Re LUs. [...] For SVHN, we add two hidden layers. [...] Appendix C: UCI Adult dataset: https://archive.ics.uci.edu/ml/datasets/Adult; UCI Diabetes dataset: https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008
Dataset Splits Yes In the case of MNIST, the student has access to 9,000 samples, among which a subset of either 100, 500, or 1,000 samples are labeled using the noisy aggregation mechanism discussed in Section 2.1. Its performance is evaluated on the 1,000 remaining samples of the test set. [...] For SVHN, the student has access to 10,000 training inputs, among which it labels 500 or 1,000 samples using the noisy aggregation mechanism. Its performance is evaluated on the remaining 16,032 samples.
Hardware Specification No The paper does not specify the hardware used for running the experiments (e.g., specific GPU or CPU models).
Software Dependencies No The paper mentions using 'TensorFlow' (implicitly via links to TensorFlow tutorials and GitHub repos) and 'scikit-learn Python package' for random forests, but it does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Our MNIST model stacks two convolutional layers with max-pooling and one fully connected layer with Re LUs. When trained on the entire dataset, the non-private model has a 99.18% test accuracy. For SVHN, we add two hidden layers. [...] We use Laplacian scale of 20 to guarantee an individual query privacy bound of ε = 0.05. [...] We train a student random forest on these 500 test set inputs and evaluate it on the last 11,282 test set inputs for the Adult dataset, and 6,352 test set inputs for the Diabetes dataset. [...] For both datasets, we train ensembles of n = 250 random forests on partitions of the training data. [...] In Appendix C, for random forests, the 'number of estimators, which we set to 100'.