Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting
Authors: Jonathan Hehir, Daniel Ting, Graham Cormode
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our sketches dramatically outperform existing theoretical solutions in simulations and on real-world data. We evaluate our methods on both real-world and synthetic datasets. The simulations use sketches with dimensions B = 4096, P = 24 by default, using the xx Hash64 (Collet, 2022) hash function, averaged over m = 1000 trials. |
| Researcher Affiliation | Collaboration | Jonathan Hehir 1 2 Daniel Ting 3 Graham Cormode 3 4 1Department of Statistics, Penn State University, State College, PA, USA 2Work performed while at Meta. 3Meta, Seattle, WA, USA 4University of Warwick, Coventry, UK. Correspondence to: Daniel Ting <dting@meta.com>. |
| Pseudocode | No | The paper presents mathematical formulas and theoretical analyses, including proofs in the appendices, but it does not contain a distinct block of pseudocode or a formally labeled algorithm. |
| Open Source Code | No | The paper does not provide any statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Real data is taken from the Bitcoin Heist paper (Akcora et al., 2020), which provides a database of Bitcoin transactions to n = 2,631,095 unique addresses. The dataset is available in the UCI repository under the CC BY 4.0 license. |
| Dataset Splits | No | The paper mentions the use of synthetic and real-world datasets for evaluation, but it does not specify any training, validation, or test split percentages or counts, nor does it reference predefined splits. |
| Hardware Specification | No | The paper does not specify the particular hardware components (e.g., CPU, GPU models) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'using the xx Hash64 (Collet, 2022) hash function' but does not provide version numbers for this or any other software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | The simulations use sketches with dimensions B = 4096, P = 24 by default, using the xx Hash64 (Collet, 2022) hash function, averaged over m = 1000 trials. We compare these methods against the sketch and estimator of Pagh & Stausholm (2021) implemented two ways: PS (loose) constructs sketches using MPS ε = Mp,q with p = 1/2, q = 1/(2 + ε) as prescribed in Pagh & Stausholm (2021), while PS (tight) uses the tightened Mxor ε (Definition 4.2). We compare private methods against the real-world Bitcoin Heist data over a variety of privacy budgets ε, ranging from 0.25 to 4. |