Differentially Private Set Representations
Authors: Sarvar Patel, Giuseppe Persiano, Joon Young Seo, Kevin Yeo
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experimental evaluation Setup. We implemented DPSet, ALP [2] and DP Count Sketch [34] in C++ using 800 lines of code. For DPSet, we use the analysis of Bienstock et al. [7] to choose appropriate parameters for δ 2 40 with parameter β = 0.05. To fit ALP and DP Count Sketch to our problem setting, we round the query results of these mechanisms to the nearest 0 or 1. We target privacy parameter δ 2 40 for all three constructions. To fairly compare utility, we chose parameters to ensure that encoding sizes are approximately equal for all three constructions (see Appendix G for more details on encoding sizes). We consider experiments for input sets of size k {212, 216, 220}. Each trial picks input sets as k uniformly random 128-bit strings from a universe of all n = 2128 strings. Although, all three constructions are agnostic to the distribution of the input set. We ran all experiments using a Ubuntu PC with 12 cores, 3.7 GHz Intel Xeon W-2135 and 64 GB of RAM. Our experiments enable AVX2 and AVX-512 instruction sets with SIMD instructions. All reported results use single-thread execution as the average of at least 1,000 trials with standard deviation less than 10% of the average. |
| Researcher Affiliation | Collaboration | Sarvar Patel Google sarvar@google.com Giuseppe Persiano Università di Salerno giuper@gmail.com Joon Young Seo Google jyseo@google.com Kevin Yeo Columbia University, Google kwlyeo@google.com |
| Pseudocode | Yes | Algorithm 1 DPSet.Encode algorithm |
| Open Source Code | No | We plan to open source the code in the near future, but at the moment they are not ready to be released publicly. |
| Open Datasets | No | Each trial picks input sets as k uniformly random 128-bit strings from a universe of all n = 2128 strings. |
| Dataset Splits | No | No explicit training, validation, or test dataset splits were provided. The paper describes generating input sets for trials rather than typical ML data splits. |
| Hardware Specification | Yes | We ran all experiments using a Ubuntu PC with 12 cores, 3.7 GHz Intel Xeon W-2135 and 64 GB of RAM. Our experiments enable AVX2 and AVX-512 instruction sets with SIMD instructions. |
| Software Dependencies | No | The paper mentions implementation in C++ but does not specify versions of compilers or libraries. |
| Experiment Setup | Yes | For DPSet, we use the analysis of Bienstock et al. [7] to choose appropriate parameters for δ 2 40 with parameter β = 0.05. To fit ALP and DP Count Sketch to our problem setting, we round the query results of these mechanisms to the nearest 0 or 1. We target privacy parameter δ 2 40 for all three constructions. We consider experiments for input sets of size k {212, 216, 220}. To fairly compare utility, we chose parameters to ensure that encoding sizes are approximately equal for all three constructions. |