An Uncertainty Principle is a Price of Privacy-Preserving Microdata

Authors: John Abowd, Robert Ashmead, Ryan Cumings-Menon, Simson Garfinkel, Daniel Kifer, Philip Leclerc, William Sexton, Ashley Simpson, Christine Task, Pavel Zhuravlev

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We created a collection of benchmark datasets that were small enough to permit running the postprocessing algorithms thousands of times on each dataset (to estimate expected errors) but large enough to demonstrate the uncertainty principle. ... We ran the Laplace mechanism using different postprocessing strategies (described in Section 4) 1,000 times for each dataset to estimate expected squared error of each query.
Researcher Affiliation Collaboration John Abowd U.S. Census Bureau and Cornell University; Robert Ashmead U.S. Census Bureau; Simson Garfinkel (formerly) U.S. Census Bureau U.S. Department of Homeland Security and George Washington University; Daniel Kifer U.S. Census Bureau and Penn State University; William Sexton (formerly) U.S. Census Bureau and Tumult Labs; Ashley Simpson Knexus; Christine Task Knexus; Pavel Zhuravlev U.S. Census Bureau
Pseudocode Yes Algorithm 1: Sequential Fitting (Postprocessing); Algorithm 2: Re Weighted Fitting (Postprocessing)
Open Source Code Yes To make our code fully open source, we wrote it in Julia [5] and after trying several open-source optimizers, we settled on COSMO [21]. ... See https://github.com/uscensusbureau/Cost Of Microdata Neur IPS2021 for the code and data.
Open Datasets Yes We created a collection of benchmark datasets ... The full benchmark of 15 real datasets and 16 synthetic datasets is described in the supplementary material.4 Here we present results for an interesting subset. ... The other 15 datasets we discuss here were taken from the 2016 ACS Public-Use Microdata Sample [39]. ... See https://github.com/uscensusbureau/Cost Of Microdata Neur IPS2021 for the code and data.
Dataset Splits No The paper describes using benchmark datasets and running algorithms multiple times to estimate expected errors, but it does not specify explicit training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific hardware details (like CPU/GPU models or memory) used for running the experiments.
Software Dependencies No The paper mentions using Julia and COSMO as software, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes For these datasets, we applied the Laplace mechanism with ϵ = 0.5 to answer the sum query, both 1-way marginal queries, and identity queries (for each cell, how many people are in it). This is also the priority order used by Sequential Fitting.