Differentially Private Synthetic Data via Foundation Model APIs 1: Images
Authors: Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Harsha Nori, Sergey Yekhanin
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct the first exploration of the potential and limits of this vision on DP synthetic images. Surprisingly, we show that not only is such a vision realizable, but that it also has the potential to match or improve SOTA training-based DP synthetic image algorithms despite more restrictive model access. Our contributions are: (3) Experimental results ( 5). |
| Researcher Affiliation | Industry | Zinan Lin Microsoft Research zinanlin@microsoft.com Sivakanth Gopi Microsoft Research sigopi@microsoft.com Janardhan Kulkarni Microsoft Research jakul@microsoft.com Harsha Nori Microsoft Research hanori@microsoft.com Sergey Yekhanin Microsoft Research yekhanin@microsoft.com |
| Pseudocode | Yes | Algorithm 1: Private Evolution (PE) ... Algorithm 2: DP Nearest Neighbors Histogram (DP NN HISTOGRAM) ... Alg. 3 shows the full algorithm. ... Algorithm 4: Private Evolution (PE) for both labeled and unlabeled data. |
| Open Source Code | Yes | The code and data are released at https://github.com/microsoft/DPSDA. |
| Open Datasets | Yes | For example, on CIFAR10 (with Image Net as the public data)... We treat CIFAR10 (Krizhevsky et al., 2009) as private data. We use Camelyon17 dataset (Bandi et al., 2018; Koh et al., 2021) as private data... The code and data are released at https://github.com/microsoft/DPSDA. |
| Dataset Splits | No | The paper mentions training a downstream classifier on generated samples and testing on the CIFAR10 test set, but it does not specify explicit training/validation/test splits for its datasets or downstream tasks within the paper. |
| Hardware Specification | Yes | To ensure a fair comparison, we estimate the runtime of both algorithms using 1 NVIDIA V100 32GB GPU. |
| Software Dependencies | No | The paper mentions using "Opacus library (Yousefpour et al., 2021)" but does not provide specific version numbers for this or any other software components (e.g., PyTorch, Python). |
| Experiment Setup | Yes | Detailed hyper-parameter settings and more results such as generated samples and their nearest images in the private dataset are in Apps. J to L. ... Hyperparameters. We set number of iterations T = 20, lookahead degree k = 8, and number of generated samples Nsyn = 50000. For RANDOM API and VARIATION API, we use DDIM sampler with 100 steps. |