Differentially Private Submodular Maximization: Data Summarization in Disguise

Authors: Marko Mitrovic, Mark Bun, Andreas Krause, Amin Karbasi

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We additionally provide two concrete experiments to validate the efficacy of these algorithms.
Researcher Affiliation Academia 1Yale University, New Haven, CT, USA 2Princeton University, Princeton, NJ, USA 3ETH Zurich, Zurich, Switzerland.
Pseudocode Yes Algorithm 1 Diff. Private Greedy (Cardinality) GO; Algorithm 2 Differentially Private Greedy (p-system) GO; Algorithm 3 Diff. Private Subsample-Greedy SGO
Open Source Code No The paper does not provide any concrete access information (e.g., a link or explicit statement of release) for the source code of the methodology described.
Open Datasets Yes We analyze a dataset of 10,000 Uber pickups in Manhattan in April 2014 (Uber Dataset). [...] Uber Dataset. Uber pickups in new york city. URL https://www. kaggle.com/fivethirtyeight/ uber-pickups-in-new-york-city. [...] We analyze a dataset created from a combination of National Health Examination Surveys ranging from 2007 to 2014 (NHANESDataset). [...] NHANESDataset. National health and nutrition examination survey (2007 2014). URL https://wwwn. cdc.gov/nchs/nhanes/default.aspx.
Dataset Splits No The paper mentions running simulations (e.g., "average utility across 100 simulations") but does not specify explicit train/validation/test splits, sample counts for each split, or cross-validation setup for the datasets used.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper describes the algorithms and experiments but does not list any specific software dependencies with version numbers.
Experiment Setup Yes We use ϵ = 0.1 and δ = 2^-20. For our settings of parameters, basic composition outperforms advanced composition, so the privacy budget of ϵ = 0.1 is split equally across the k iterations, meaning the mechanism at each iteration uses ϵ0 = ϵ/k. Our figures plot the average utility across 100 simulations. [...] We run 1,000 simulations with ϵ = 1.0 and δ = 2^-20.