Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures
Authors: Alina Ene, Alessandro Epasto, Vahab Mirrokni, Hoai-An Nguyen, Huy Nguyen, David Woodruff, Peilin Zhong
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation confirms the practicality of our fingerprinting algorithms demonstrating a speedup of up to 210x over prior work. Experimental Results. We illustrate the practicality of our fingerprinting algorithms by running experiments on two different datasets. |
| Researcher Affiliation | Collaboration | Alina Ene 1 Alessandro Epasto 2 Vahab Mirrokni 2 Hoai-An Nguyen 3 Huy L. Nguyen 4 David P. Woodruff 2 3 Peilin Zhong 2. 1Boston University 2Google Research 3Carnegie Mellon University 4Northeastern University. |
| Pseudocode | Yes | Algorithm 1 building-A (n d matrix A, ε (0, 1), k) ... Algorithm 2 max-coverage ... Algorithm 3 A (k, ε, δ) ... Algorithm 4 k-cover ... Algorithm 5 Max-Coverage-LS (n d matrix A, ε (0, 1), k) ... Algorithm 6 sketchy-submodular-maximization ... Algorithm 7 p-Tuples-Sketch (n 1 vector x, constant integer p 2, γ, δ (0, 1)) ... Algorithm 8 general-fingerprinting-sketch (n d matrix A, ε (0, 1), k 0) |
| Open Source Code | Yes | All experiments were run locally on a M2 Mac Book Air. The code can be found here. |
| Open Datasets | Yes | We use two publicly-available datasets, the UC Irvine Adult and US Census Data (1990) (Becker & Kohavi, 1996; Meek et al.). Aeberhard, S. and Forina, M. Wine. UCI Machine Learning Repository, 1991. DOI: https://doi.org/10.24432/C5PC7J. Becker, B. and Kohavi, R. Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20. Meek, C., Thiesson, B., and Heckerman, D. US Census Data (1990). UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5VP42. |
| Dataset Splits | No | The paper does not explicitly provide specific dataset split information (e.g., percentages, sample counts for training, validation, or test sets). |
| Hardware Specification | Yes | All experiments were run locally on a M2 Mac Book Air. |
| Software Dependencies | No | The paper mentions the implementation of a baseline ( |
| Experiment Setup | Yes | The main variable we vary is the size of our L0 sketch, specifically with 300, 600, 900, and 1, 250 rows. Then, we ran k-means with 3 clusters (for the 3 wine types) using just the selected features. |