Bias and Variance of Post-processing in Differential Privacy

Authors: Keyu Zhu, Pascal Van Hentenryck, Ferdinando Fioretto11177-11184

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper takes a first step towards understanding the properties of post-processing. It considers the release of census data and examines, both theoretically and empirically, the behavior of a widely adopted class of postprocessing functions. The theoretical results in this paper are illustrated using an empirical analysis from this census case study.
Researcher Affiliation Academia 1 Georgia Institute of Technology, 2 Syracuse University keyu.zhu@gatech.edu, pvh@isye.gatech.edu, ffiorett@syr.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets No The paper uses 'census data' for its empirical analysis and refers to the US Census Bureau and a specific paper (Abowd 2018) as a source or motivation. However, it does not provide a specific link, DOI, repository name, or citation with author/year for the exact dataset used in their experiments, nor does it explicitly state it's publicly available with access details. For instance, it mentions 'New Mexico and its counties' and 'states of Arizona and Texas' without providing access to these specific data instances.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce data partitioning for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes The privacy budget ϵ is set to be 0.5 and the experiments perform 100 independent runs. The state has a population of 7,289,112, 33 counties, rm = 348, and the experiment uses λ = 5. For λ = 10, the distribution variances are 186.67 and 199.21 for Arizona and Texas respectively. Over 80,000 experiments, the empirical variances were 186.88 and 199.32 respectively.