Private Synthetic Data for Multitask Learning and Marginal Queries

Authors: Giuseppe Vietri, Cedric Archambeau, Sergul Aydore, William Brown, Michael Kearns, Aaron Roth, Ankit Siva, Shuai Tang, Steven Z. Wu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide comprehensive empirical evaluations comparing RAP++ against several benchmarks, including PGM [MSM19], DP-CTGAN [FDK22], and DP-MERF [HAP21] on datasets derived from the US Census.
Researcher Affiliation Collaboration Giuseppe Vietri University of Minnesota vietr002@umn.edu Cedric Archambeau Amazon AWS AI/ML cedrica@amazon.com Sergul Aydore Amazon AWS AI/ML saydore@amazon.com William Brown Columbia University w.brown@columbia.edu Michael Kearns Amazon AWS AI/ML University of Pennsylvania mkearns@cis.upenn.edu Aaron Roth Amazon AWS AI/ML University of Pennsylvania aaroth@cis.upenn.edu Ankit Siva Amazon AWS AI/ML ankitsiv@amazon.com Shuai Tang Amazon AWS AI/ML shuat@amazon.com Zhiwei Steven Wu Amazon AWS AI/ML Carnegie Mellon University zhiweiw@andrew.cmu.edu
Pseudocode Yes Algorithm 1 Relaxed Projection with Sigmoid Temperature Annealing
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes We use a suite of new datasets derived from US Census, which are introduced in [DHMS21]. These datasets include five pre-defined prediction tasks, predicting labels related to income, employment, health, transportation, and housing. Each task defines feature columns (categorical and numerical) and target columns, where feature columns are used to train a model to predict the target column. We used the folktables package [DHMS21] to extract features and tasks.3 In the appendix, we include a table that summarizes the number of categorical and numerical features for each ACS task and the number of rows in each of the 25 datasets.
Dataset Splits No For each dataset, we use 80 percent of the rows as a training dataset and the remainder as a test dataset. No separate validation split was explicitly mentioned.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were provided in the paper.
Software Dependencies No The paper mentions JAX and Pytorch, but does not provide specific version numbers for software dependencies used in the experiments.
Experiment Setup Yes For both RAP and PGM we discretized all numerical features using an equal-sized binning, and compare the performance for numbers of bins in {10, 20, 30, 50, 100}. For the results, we choose 30 bins for discretization, as it performs well overall across different tasks and privacy levels. In the appendix we show more results for different choices of bin size. The other relevant parameter for both PGM and RAP is the number of epochs of adaptivity, which is fixed to be d − 1, where d is the number of columns in the data. Next we describe the relevant parameters used to train RAP++ . Since RAP++ optimizes over two query classes (CM and LT), the first parameter TCM corresponds to the number of adaptive epochs for selecting CM queries. The other parameter TLT corresponds to the number of adaptive epochs for selecting LT queries. To be consistent with PGM and RAP we always choose TCM = d|C| − 1, where d|C| is the number of categorical columns in the data. Finally, K is the number of queries selected per epoch as described in algorithm 2. We fix the parameters of RAP++ to be TLT = 50 and K = 10 since it works well across all settings in our experiments.