Oblivious Sampling Algorithms for Private Data Analysis

Authors: Sajin Sasy, Olga Ohrimenko

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally we show that accuracy of models trained with shuffling and sampling is the same for differentially private models for MNIST and CIFAR-10, while sampling provides stronger privacy guarantees than shuffling.We use Tensor Flow v1.13 and Tensor Flow Privacy library [5] for DP training. We implement non-oblivious SWO and Poisson sampling mechanisms since accuracy of the training procedure is independent of sampling implementation. We report an average of 5 runs for each experiment.
Researcher Affiliation Collaboration Sajin Sasy University of Waterloo Olga Ohrimenko Microsoft Research
Pseudocode Yes Algorithm 1 Oblivious samples SWO(D, m): takes an encrypted dataset D and returns k = n/m SWO samples of size m, n = |D|.
Open Source Code No The paper mentions using a third-party library, TensorFlow Privacy [5], and provides its link. However, it does not provide source code for the novel algorithms developed in this paper.
Open Datasets Yes MNIST dataset contains 60,000 train and 20,000 test images of ten digits with the classification tasks of determining which digit an image corresponds to.CIFAR-10 dataset consists of 50,000 training and 10,000 test color images classified into 10 classes [1]. [1] CIFAR datasets. http://www.cs.toronto.edu/~kriz/cifar.html.
Dataset Splits No The paper specifies training and test set sizes (e.g., 60,000 train and 20,000 test for MNIST, 50,000 training and 10,000 test for CIFAR-10), but it does not explicitly mention a separate validation set or specific splits for validation purposes.
Hardware Specification No The paper mentions general hardware capabilities like Trusted Execution Environments (TEEs) and Intel SGX for the proposed framework, but it does not specify the exact hardware (e.g., CPU, GPU models, memory) used to run the experiments reported in Section 5.
Software Dependencies Yes We use Tensor Flow v1.13 and Tensor Flow Privacy library [5] for DP training.
Experiment Setup Yes We set the clipping parameter to 4, σ = 6, δ = 10 5. For each sampling mechanism we use a different privacy accountant to compute exact total ϵ as opposed to asymptotical guarantees in Table 1.For the first two we use batch size m = 600, γ = 0.01 and m = 200, γ = 0.003 in Figure 1. Each network is trained for 100 epochs.Each network is trained for 100 epochs with sample size of m = 2000.