reproducibilityindex.ai

Oblivious Sampling Algorithms for Private Data Analysis

Authors: Sajin Sasy, Olga Ohrimenko

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally we show that accuracy of models trained with shufﬂing and sampling is the same for differentially private models for MNIST and CIFAR-10, while sampling provides stronger privacy guarantees than shufﬂing.We use Tensor Flow v1.13 and Tensor Flow Privacy library [5] for DP training. We implement non-oblivious SWO and Poisson sampling mechanisms since accuracy of the training procedure is independent of sampling implementation. We report an average of 5 runs for each experiment.
Researcher Affiliation	Collaboration	Sajin Sasy University of Waterloo Olga Ohrimenko Microsoft Research
Pseudocode	Yes	Algorithm 1 Oblivious samples SWO(D, m): takes an encrypted dataset D and returns k = n/m SWO samples of size m, n = \|D\|.
Open Source Code	No	The paper mentions using a third-party library, TensorFlow Privacy [5], and provides its link. However, it does not provide source code for the novel algorithms developed in this paper.
Open Datasets	Yes	MNIST dataset contains 60,000 train and 20,000 test images of ten digits with the classiﬁcation tasks of determining which digit an image corresponds to.CIFAR-10 dataset consists of 50,000 training and 10,000 test color images classiﬁed into 10 classes [1]. [1] CIFAR datasets. http://www.cs.toronto.edu/~kriz/cifar.html.
Dataset Splits	No	The paper specifies training and test set sizes (e.g., 60,000 train and 20,000 test for MNIST, 50,000 training and 10,000 test for CIFAR-10), but it does not explicitly mention a separate validation set or specific splits for validation purposes.
Hardware Specification	No	The paper mentions general hardware capabilities like Trusted Execution Environments (TEEs) and Intel SGX for the proposed framework, but it does not specify the exact hardware (e.g., CPU, GPU models, memory) used to run the experiments reported in Section 5.
Software Dependencies	Yes	We use Tensor Flow v1.13 and Tensor Flow Privacy library [5] for DP training.
Experiment Setup	Yes	We set the clipping parameter to 4, σ = 6, δ = 10 5. For each sampling mechanism we use a different privacy accountant to compute exact total ϵ as opposed to asymptotical guarantees in Table 1.For the ﬁrst two we use batch size m = 600, γ = 0.01 and m = 200, γ = 0.003 in Figure 1. Each network is trained for 100 epochs.Each network is trained for 100 epochs with sample size of m = 2000.