reproducibilityindex.ai

Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting

Authors: Jonathan Hehir, Daniel Ting, Graham Cormode

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our sketches dramatically outperform existing theoretical solutions in simulations and on real-world data. We evaluate our methods on both real-world and synthetic datasets. The simulations use sketches with dimensions B = 4096, P = 24 by default, using the xx Hash64 (Collet, 2022) hash function, averaged over m = 1000 trials.
Researcher Affiliation	Collaboration	Jonathan Hehir 1 2 Daniel Ting 3 Graham Cormode 3 4 1Department of Statistics, Penn State University, State College, PA, USA 2Work performed while at Meta. 3Meta, Seattle, WA, USA 4University of Warwick, Coventry, UK. Correspondence to: Daniel Ting <dting@meta.com>.
Pseudocode	No	The paper presents mathematical formulas and theoretical analyses, including proofs in the appendices, but it does not contain a distinct block of pseudocode or a formally labeled algorithm.
Open Source Code	No	The paper does not provide any statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Real data is taken from the Bitcoin Heist paper (Akcora et al., 2020), which provides a database of Bitcoin transactions to n = 2,631,095 unique addresses. The dataset is available in the UCI repository under the CC BY 4.0 license.
Dataset Splits	No	The paper mentions the use of synthetic and real-world datasets for evaluation, but it does not specify any training, validation, or test split percentages or counts, nor does it reference predefined splits.
Hardware Specification	No	The paper does not specify the particular hardware components (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies	No	The paper mentions 'using the xx Hash64 (Collet, 2022) hash function' but does not provide version numbers for this or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	The simulations use sketches with dimensions B = 4096, P = 24 by default, using the xx Hash64 (Collet, 2022) hash function, averaged over m = 1000 trials. We compare these methods against the sketch and estimator of Pagh & Stausholm (2021) implemented two ways: PS (loose) constructs sketches using MPS ε = Mp,q with p = 1/2, q = 1/(2 + ε) as prescribed in Pagh & Stausholm (2021), while PS (tight) uses the tightened Mxor ε (Definition 4.2). We compare private methods against the real-world Bitcoin Heist data over a variety of privacy budgets ε, ranging from 0.25 to 4.