Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting

Authors: Jonathan Hehir, Daniel Ting, Graham Cormode

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our sketches dramatically outperform existing theoretical solutions in simulations and on real-world data. We evaluate our methods on both real-world and synthetic datasets. The simulations use sketches with dimensions B = 4096, P = 24 by default, using the xx Hash64 (Collet, 2022) hash function, averaged over m = 1000 trials.
Researcher Affiliation Collaboration Jonathan Hehir 1 2 Daniel Ting 3 Graham Cormode 3 4 1Department of Statistics, Penn State University, State College, PA, USA 2Work performed while at Meta. 3Meta, Seattle, WA, USA 4University of Warwick, Coventry, UK. Correspondence to: Daniel Ting <dting@meta.com>.
Pseudocode No The paper presents mathematical formulas and theoretical analyses, including proofs in the appendices, but it does not contain a distinct block of pseudocode or a formally labeled algorithm.
Open Source Code No The paper does not provide any statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Real data is taken from the Bitcoin Heist paper (Akcora et al., 2020), which provides a database of Bitcoin transactions to n = 2,631,095 unique addresses. The dataset is available in the UCI repository under the CC BY 4.0 license.
Dataset Splits No The paper mentions the use of synthetic and real-world datasets for evaluation, but it does not specify any training, validation, or test split percentages or counts, nor does it reference predefined splits.
Hardware Specification No The paper does not specify the particular hardware components (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper mentions 'using the xx Hash64 (Collet, 2022) hash function' but does not provide version numbers for this or any other software libraries or dependencies used in the experiments.
Experiment Setup Yes The simulations use sketches with dimensions B = 4096, P = 24 by default, using the xx Hash64 (Collet, 2022) hash function, averaged over m = 1000 trials. We compare these methods against the sketch and estimator of Pagh & Stausholm (2021) implemented two ways: PS (loose) constructs sketches using MPS ε = Mp,q with p = 1/2, q = 1/(2 + ε) as prescribed in Pagh & Stausholm (2021), while PS (tight) uses the tightened Mxor ε (Definition 4.2). We compare private methods against the real-world Bitcoin Heist data over a variety of privacy budgets ε, ranging from 0.25 to 4.