Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures

Authors: Alina Ene, Alessandro Epasto, Vahab Mirrokni, Hoai-An Nguyen, Huy Nguyen, David Woodruff, Peilin Zhong

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation confirms the practicality of our fingerprinting algorithms demonstrating a speedup of up to 210x over prior work. Experimental Results. We illustrate the practicality of our fingerprinting algorithms by running experiments on two different datasets.
Researcher Affiliation Collaboration Alina Ene 1 Alessandro Epasto 2 Vahab Mirrokni 2 Hoai-An Nguyen 3 Huy L. Nguyen 4 David P. Woodruff 2 3 Peilin Zhong 2. 1Boston University 2Google Research 3Carnegie Mellon University 4Northeastern University.
Pseudocode Yes Algorithm 1 building-A (n d matrix A, ε (0, 1), k) ... Algorithm 2 max-coverage ... Algorithm 3 A (k, ε, δ) ... Algorithm 4 k-cover ... Algorithm 5 Max-Coverage-LS (n d matrix A, ε (0, 1), k) ... Algorithm 6 sketchy-submodular-maximization ... Algorithm 7 p-Tuples-Sketch (n 1 vector x, constant integer p 2, γ, δ (0, 1)) ... Algorithm 8 general-fingerprinting-sketch (n d matrix A, ε (0, 1), k 0)
Open Source Code Yes All experiments were run locally on a M2 Mac Book Air. The code can be found here.
Open Datasets Yes We use two publicly-available datasets, the UC Irvine Adult and US Census Data (1990) (Becker & Kohavi, 1996; Meek et al.). Aeberhard, S. and Forina, M. Wine. UCI Machine Learning Repository, 1991. DOI: https://doi.org/10.24432/C5PC7J. Becker, B. and Kohavi, R. Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20. Meek, C., Thiesson, B., and Heckerman, D. US Census Data (1990). UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5VP42.
Dataset Splits No The paper does not explicitly provide specific dataset split information (e.g., percentages, sample counts for training, validation, or test sets).
Hardware Specification Yes All experiments were run locally on a M2 Mac Book Air.
Software Dependencies No The paper mentions the implementation of a baseline (
Experiment Setup Yes The main variable we vary is the size of our L0 sketch, specifically with 300, 600, 900, and 1, 250 rows. Then, we ran k-means with 3 clusters (for the 3 wine types) using just the selected features.