reproducibilityindex.ai

An Uncertainty Principle is a Price of Privacy-Preserving Microdata

Authors: John Abowd, Robert Ashmead, Ryan Cumings-Menon, Simson Garfinkel, Daniel Kifer, Philip Leclerc, William Sexton, Ashley Simpson, Christine Task, Pavel Zhuravlev

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We created a collection of benchmark datasets that were small enough to permit running the postprocessing algorithms thousands of times on each dataset (to estimate expected errors) but large enough to demonstrate the uncertainty principle. ... We ran the Laplace mechanism using different postprocessing strategies (described in Section 4) 1,000 times for each dataset to estimate expected squared error of each query.
Researcher Affiliation	Collaboration	John Abowd U.S. Census Bureau and Cornell University; Robert Ashmead U.S. Census Bureau; Simson Garﬁnkel (formerly) U.S. Census Bureau U.S. Department of Homeland Security and George Washington University; Daniel Kifer U.S. Census Bureau and Penn State University; William Sexton (formerly) U.S. Census Bureau and Tumult Labs; Ashley Simpson Knexus; Christine Task Knexus; Pavel Zhuravlev U.S. Census Bureau
Pseudocode	Yes	Algorithm 1: Sequential Fitting (Postprocessing); Algorithm 2: Re Weighted Fitting (Postprocessing)
Open Source Code	Yes	To make our code fully open source, we wrote it in Julia [5] and after trying several open-source optimizers, we settled on COSMO [21]. ... See https://github.com/uscensusbureau/Cost Of Microdata Neur IPS2021 for the code and data.
Open Datasets	Yes	We created a collection of benchmark datasets ... The full benchmark of 15 real datasets and 16 synthetic datasets is described in the supplementary material.4 Here we present results for an interesting subset. ... The other 15 datasets we discuss here were taken from the 2016 ACS Public-Use Microdata Sample [39]. ... See https://github.com/uscensusbureau/Cost Of Microdata Neur IPS2021 for the code and data.
Dataset Splits	No	The paper describes using benchmark datasets and running algorithms multiple times to estimate expected errors, but it does not specify explicit training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details (like CPU/GPU models or memory) used for running the experiments.
Software Dependencies	No	The paper mentions using Julia and COSMO as software, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	For these datasets, we applied the Laplace mechanism with ϵ = 0.5 to answer the sum query, both 1-way marginal queries, and identity queries (for each cell, how many people are in it). This is also the priority order used by Sequential Fitting.