Privacy-Preserving Data Release Leveraging Optimal Transport and Particle Gradient Descent
Authors: Konstantin Donhauser, Javier Abad, Neha Hulkund, Fanny Yang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present a systematic large-scale experimental evaluation of our algorithm. ... We use 9 real-world datasets from various sources, detailed in Appendix A. ... We evaluate the statistical and downstream task performance of our algorithm with the following standard metrics for DP data synthesis: |
| Researcher Affiliation | Academia | 1Department of Computer Science, ETH, Zurich, Switzerland 2MIT CSAIL, Boston, USA. |
| Pseudocode | Yes | Algorithm 1 Standard data synthesis framework; Algorithm 2 Extension to sequential query selection; Algorithm 3 Private Particle Gradient Descent |
| Open Source Code | Yes | 1See our Git Hub repository for the source code: https://github.com/jaabmar/private-pgd. |
| Open Datasets | Yes | We use 9 real-world datasets from various sources, detailed in Appendix A. Each dataset contains no fewer than 50, 000 data points... [Appendix A details datasets such as] ACS Income classification dataset (Inc.) (Ding et al., 2021). ... Medical charges regression dataset (Med.) (Grinsztajn et al., 2022). |
| Dataset Splits | No | For these evaluations, we allocate 80% of the data as private data D and use the remaining 20% for test data Dtest. The paper does not specify a separate validation split. |
| Hardware Specification | Yes | Our cluster consists of modern GPUs, at least of the NVIDIA Ge Force RTX 2080 Ti type |
| Software Dependencies | No | The paper states: 'We implement Priv PGD using Py Torch' and 'We use the implementation from scikit-learn (Buitinck et al., 2013) for Gradient Boosting', but does not provide specific version numbers for PyTorch, scikit-learn, or other software dependencies. |
| Experiment Setup | Yes | We implement Priv PGD using Py Torch and use the same hyperparameters for all experiments. We select all 2-Way marginals and use the Gaussian mechanism to construct DP-copies of them... We then generate a dataset by running Priv PGD (Algorithm 3) with 100k particles. ... we use 200 MC random projections... running gradient descent for 1750 iterations using Adam with an initial learning rate of 0.1 and a linear learning rate scheduler with step size 100 and multiplicative factor 0.8. ... We minimize the objective in Equation (7) by running gradient descent for 1000 epochs... using an initial learning rate of 0.1 and a linear learning rate scheduler with step size 50 and multiplicative factor 0.75. We divide S into mini-batches of size 5 and randomly set 80% of the gradient entries to zero. |