An Uncertainty Principle is a Price of Privacy-Preserving Microdata
Authors: John Abowd, Robert Ashmead, Ryan Cumings-Menon, Simson Garfinkel, Daniel Kifer, Philip Leclerc, William Sexton, Ashley Simpson, Christine Task, Pavel Zhuravlev
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We created a collection of benchmark datasets that were small enough to permit running the postprocessing algorithms thousands of times on each dataset (to estimate expected errors) but large enough to demonstrate the uncertainty principle. ... We ran the Laplace mechanism using different postprocessing strategies (described in Section 4) 1,000 times for each dataset to estimate expected squared error of each query. |
| Researcher Affiliation | Collaboration | John Abowd U.S. Census Bureau and Cornell University; Robert Ashmead U.S. Census Bureau; Simson Garfinkel (formerly) U.S. Census Bureau U.S. Department of Homeland Security and George Washington University; Daniel Kifer U.S. Census Bureau and Penn State University; William Sexton (formerly) U.S. Census Bureau and Tumult Labs; Ashley Simpson Knexus; Christine Task Knexus; Pavel Zhuravlev U.S. Census Bureau |
| Pseudocode | Yes | Algorithm 1: Sequential Fitting (Postprocessing); Algorithm 2: Re Weighted Fitting (Postprocessing) |
| Open Source Code | Yes | To make our code fully open source, we wrote it in Julia [5] and after trying several open-source optimizers, we settled on COSMO [21]. ... See https://github.com/uscensusbureau/Cost Of Microdata Neur IPS2021 for the code and data. |
| Open Datasets | Yes | We created a collection of benchmark datasets ... The full benchmark of 15 real datasets and 16 synthetic datasets is described in the supplementary material.4 Here we present results for an interesting subset. ... The other 15 datasets we discuss here were taken from the 2016 ACS Public-Use Microdata Sample [39]. ... See https://github.com/uscensusbureau/Cost Of Microdata Neur IPS2021 for the code and data. |
| Dataset Splits | No | The paper describes using benchmark datasets and running algorithms multiple times to estimate expected errors, but it does not specify explicit training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (like CPU/GPU models or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Julia and COSMO as software, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | For these datasets, we applied the Laplace mechanism with ϵ = 0.5 to answer the sum query, both 1-way marginal queries, and identity queries (for each cell, how many people are in it). This is also the priority order used by Sequential Fitting. |