Differentially Private Set Representations

Authors: Sarvar Patel, Giuseppe Persiano, Joon Young Seo, Kevin Yeo

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experimental evaluation Setup. We implemented DPSet, ALP [2] and DP Count Sketch [34] in C++ using 800 lines of code. For DPSet, we use the analysis of Bienstock et al. [7] to choose appropriate parameters for δ 2 40 with parameter β = 0.05. To fit ALP and DP Count Sketch to our problem setting, we round the query results of these mechanisms to the nearest 0 or 1. We target privacy parameter δ 2 40 for all three constructions. To fairly compare utility, we chose parameters to ensure that encoding sizes are approximately equal for all three constructions (see Appendix G for more details on encoding sizes). We consider experiments for input sets of size k {212, 216, 220}. Each trial picks input sets as k uniformly random 128-bit strings from a universe of all n = 2128 strings. Although, all three constructions are agnostic to the distribution of the input set. We ran all experiments using a Ubuntu PC with 12 cores, 3.7 GHz Intel Xeon W-2135 and 64 GB of RAM. Our experiments enable AVX2 and AVX-512 instruction sets with SIMD instructions. All reported results use single-thread execution as the average of at least 1,000 trials with standard deviation less than 10% of the average.
Researcher Affiliation Collaboration Sarvar Patel Google sarvar@google.com Giuseppe Persiano Università di Salerno giuper@gmail.com Joon Young Seo Google jyseo@google.com Kevin Yeo Columbia University, Google kwlyeo@google.com
Pseudocode Yes Algorithm 1 DPSet.Encode algorithm
Open Source Code No We plan to open source the code in the near future, but at the moment they are not ready to be released publicly.
Open Datasets No Each trial picks input sets as k uniformly random 128-bit strings from a universe of all n = 2128 strings.
Dataset Splits No No explicit training, validation, or test dataset splits were provided. The paper describes generating input sets for trials rather than typical ML data splits.
Hardware Specification Yes We ran all experiments using a Ubuntu PC with 12 cores, 3.7 GHz Intel Xeon W-2135 and 64 GB of RAM. Our experiments enable AVX2 and AVX-512 instruction sets with SIMD instructions.
Software Dependencies No The paper mentions implementation in C++ but does not specify versions of compilers or libraries.
Experiment Setup Yes For DPSet, we use the analysis of Bienstock et al. [7] to choose appropriate parameters for δ 2 40 with parameter β = 0.05. To fit ALP and DP Count Sketch to our problem setting, we round the query results of these mechanisms to the nearest 0 or 1. We target privacy parameter δ 2 40 for all three constructions. We consider experiments for input sets of size k {212, 216, 220}. To fairly compare utility, we chose parameters to ensure that encoding sizes are approximately equal for all three constructions.