Generating Private Synthetic Data with Genetic Algorithms
Authors: Terrance Liu, Jingwu Tang, Giuseppe Vietri, Steven Wu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically that on data with both discrete and real-valued attributes, PRIVATE-GSD outperforms the state-of-the-art methods on nondifferential queries while matching accuracy in approximating differentiable ones. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University 2Peking University 3University of Minnesota. |
| Pseudocode | Yes | Algorithm 1 Private Genetic Algorithm for Synthetic Data (PRIVATE-GSD) |
| Open Source Code | Yes | The PRIVATE-GSD source code is publicly available at https: //github.com/giusevtr/private_gsd. |
| Open Datasets | Yes | For our empirical evaluation, we use datasets derived from the Folktables package (Ding et al., 2021), which defines datasets using samples from the American Community Survey (ACS). |
| Dataset Splits | No | For the main experiments, the paper mentions using datasets from the Folktables package but does not specify any training/validation/test splits. For the ML evaluation in Appendix C, it states 'dividing each dataset into a training and test set, using an 80/20 partition,' but no separate validation split is mentioned. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using the 'Folktables package (Ding et al., 2021)' but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | Table 7. Hyperparameters experiments (with adaptivity). lists detailed hyperparameter values such as Data Size (N), Pmut, Pcross, Elite Size, Max Generations, Queries Sampled (K), Learning Rate, Inverse Temp. (σit), # Product Mixtures (K), Batch Size (B), Max Iterations (M), # Samples, and T (adaptive epochs) for various methods. |