Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Differentially Private Submodular Maximization: Data Summarization in Disguise
Authors: Marko Mitrovic, Mark Bun, Andreas Krause, Amin Karbasi
ICML 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We additionally provide two concrete experiments to validate the efficacy of these algorithms. |
| Researcher Affiliation | Academia | 1Yale University, New Haven, CT, USA 2Princeton University, Princeton, NJ, USA 3ETH Zurich, Zurich, Switzerland. |
| Pseudocode | Yes | Algorithm 1 Diff. Private Greedy (Cardinality) GO; Algorithm 2 Differentially Private Greedy (p-system) GO; Algorithm 3 Diff. Private Subsample-Greedy SGO |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., a link or explicit statement of release) for the source code of the methodology described. |
| Open Datasets | Yes | We analyze a dataset of 10,000 Uber pickups in Manhattan in April 2014 (Uber Dataset). [...] Uber Dataset. Uber pickups in new york city. URL https://www. kaggle.com/fivethirtyeight/ uber-pickups-in-new-york-city. [...] We analyze a dataset created from a combination of National Health Examination Surveys ranging from 2007 to 2014 (NHANESDataset). [...] NHANESDataset. National health and nutrition examination survey (2007 2014). URL https://wwwn. cdc.gov/nchs/nhanes/default.aspx. |
| Dataset Splits | No | The paper mentions running simulations (e.g., "average utility across 100 simulations") but does not specify explicit train/validation/test splits, sample counts for each split, or cross-validation setup for the datasets used. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper describes the algorithms and experiments but does not list any specific software dependencies with version numbers. |
| Experiment Setup | Yes | We use ϵ = 0.1 and δ = 2^-20. For our settings of parameters, basic composition outperforms advanced composition, so the privacy budget of ϵ = 0.1 is split equally across the k iterations, meaning the mechanism at each iteration uses ϵ0 = ϵ/k. Our figures plot the average utility across 100 simulations. [...] We run 1,000 simulations with ϵ = 1.0 and δ = 2^-20. |