Oblivious Sampling Algorithms for Private Data Analysis
Authors: Sajin Sasy, Olga Ohrimenko
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally we show that accuracy of models trained with shuffling and sampling is the same for differentially private models for MNIST and CIFAR-10, while sampling provides stronger privacy guarantees than shuffling.We use Tensor Flow v1.13 and Tensor Flow Privacy library [5] for DP training. We implement non-oblivious SWO and Poisson sampling mechanisms since accuracy of the training procedure is independent of sampling implementation. We report an average of 5 runs for each experiment. |
| Researcher Affiliation | Collaboration | Sajin Sasy University of Waterloo Olga Ohrimenko Microsoft Research |
| Pseudocode | Yes | Algorithm 1 Oblivious samples SWO(D, m): takes an encrypted dataset D and returns k = n/m SWO samples of size m, n = |D|. |
| Open Source Code | No | The paper mentions using a third-party library, TensorFlow Privacy [5], and provides its link. However, it does not provide source code for the novel algorithms developed in this paper. |
| Open Datasets | Yes | MNIST dataset contains 60,000 train and 20,000 test images of ten digits with the classification tasks of determining which digit an image corresponds to.CIFAR-10 dataset consists of 50,000 training and 10,000 test color images classified into 10 classes [1]. [1] CIFAR datasets. http://www.cs.toronto.edu/~kriz/cifar.html. |
| Dataset Splits | No | The paper specifies training and test set sizes (e.g., 60,000 train and 20,000 test for MNIST, 50,000 training and 10,000 test for CIFAR-10), but it does not explicitly mention a separate validation set or specific splits for validation purposes. |
| Hardware Specification | No | The paper mentions general hardware capabilities like Trusted Execution Environments (TEEs) and Intel SGX for the proposed framework, but it does not specify the exact hardware (e.g., CPU, GPU models, memory) used to run the experiments reported in Section 5. |
| Software Dependencies | Yes | We use Tensor Flow v1.13 and Tensor Flow Privacy library [5] for DP training. |
| Experiment Setup | Yes | We set the clipping parameter to 4, σ = 6, δ = 10 5. For each sampling mechanism we use a different privacy accountant to compute exact total ϵ as opposed to asymptotical guarantees in Table 1.For the first two we use batch size m = 600, γ = 0.01 and m = 200, γ = 0.003 in Figure 1. Each network is trained for 100 epochs.Each network is trained for 100 epochs with sample size of m = 2000. |