reproducibilityindex.ai

Privacy-Preserving Data Release Leveraging Optimal Transport and Particle Gradient Descent

Authors: Konstantin Donhauser, Javier Abad, Neha Hulkund, Fanny Yang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present a systematic large-scale experimental evaluation of our algorithm. ... We use 9 real-world datasets from various sources, detailed in Appendix A. ... We evaluate the statistical and downstream task performance of our algorithm with the following standard metrics for DP data synthesis:
Researcher Affiliation	Academia	1Department of Computer Science, ETH, Zurich, Switzerland 2MIT CSAIL, Boston, USA.
Pseudocode	Yes	Algorithm 1 Standard data synthesis framework; Algorithm 2 Extension to sequential query selection; Algorithm 3 Private Particle Gradient Descent
Open Source Code	Yes	1See our Git Hub repository for the source code: https://github.com/jaabmar/private-pgd.
Open Datasets	Yes	We use 9 real-world datasets from various sources, detailed in Appendix A. Each dataset contains no fewer than 50, 000 data points... [Appendix A details datasets such as] ACS Income classification dataset (Inc.) (Ding et al., 2021). ... Medical charges regression dataset (Med.) (Grinsztajn et al., 2022).
Dataset Splits	No	For these evaluations, we allocate 80% of the data as private data D and use the remaining 20% for test data Dtest. The paper does not specify a separate validation split.
Hardware Specification	Yes	Our cluster consists of modern GPUs, at least of the NVIDIA Ge Force RTX 2080 Ti type
Software Dependencies	No	The paper states: 'We implement Priv PGD using Py Torch' and 'We use the implementation from scikit-learn (Buitinck et al., 2013) for Gradient Boosting', but does not provide specific version numbers for PyTorch, scikit-learn, or other software dependencies.
Experiment Setup	Yes	We implement Priv PGD using Py Torch and use the same hyperparameters for all experiments. We select all 2-Way marginals and use the Gaussian mechanism to construct DP-copies of them... We then generate a dataset by running Priv PGD (Algorithm 3) with 100k particles. ... we use 200 MC random projections... running gradient descent for 1750 iterations using Adam with an initial learning rate of 0.1 and a linear learning rate scheduler with step size 100 and multiplicative factor 0.8. ... We minimize the objective in Equation (7) by running gradient descent for 1000 epochs... using an initial learning rate of 0.1 and a linear learning rate scheduler with step size 50 and multiplicative factor 0.75. We divide S into mini-batches of size 5 and randomly set 80% of the gradient entries to zero.