Differentially Private Synthetic Data via Foundation Model APIs 1: Images

Authors: Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Harsha Nori, Sergey Yekhanin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct the first exploration of the potential and limits of this vision on DP synthetic images. Surprisingly, we show that not only is such a vision realizable, but that it also has the potential to match or improve SOTA training-based DP synthetic image algorithms despite more restrictive model access. Our contributions are: (3) Experimental results ( 5).
Researcher Affiliation Industry Zinan Lin Microsoft Research zinanlin@microsoft.com Sivakanth Gopi Microsoft Research sigopi@microsoft.com Janardhan Kulkarni Microsoft Research jakul@microsoft.com Harsha Nori Microsoft Research hanori@microsoft.com Sergey Yekhanin Microsoft Research yekhanin@microsoft.com
Pseudocode Yes Algorithm 1: Private Evolution (PE) ... Algorithm 2: DP Nearest Neighbors Histogram (DP NN HISTOGRAM) ... Alg. 3 shows the full algorithm. ... Algorithm 4: Private Evolution (PE) for both labeled and unlabeled data.
Open Source Code Yes The code and data are released at https://github.com/microsoft/DPSDA.
Open Datasets Yes For example, on CIFAR10 (with Image Net as the public data)... We treat CIFAR10 (Krizhevsky et al., 2009) as private data. We use Camelyon17 dataset (Bandi et al., 2018; Koh et al., 2021) as private data... The code and data are released at https://github.com/microsoft/DPSDA.
Dataset Splits No The paper mentions training a downstream classifier on generated samples and testing on the CIFAR10 test set, but it does not specify explicit training/validation/test splits for its datasets or downstream tasks within the paper.
Hardware Specification Yes To ensure a fair comparison, we estimate the runtime of both algorithms using 1 NVIDIA V100 32GB GPU.
Software Dependencies No The paper mentions using "Opacus library (Yousefpour et al., 2021)" but does not provide specific version numbers for this or any other software components (e.g., PyTorch, Python).
Experiment Setup Yes Detailed hyper-parameter settings and more results such as generated samples and their nearest images in the private dataset are in Apps. J to L. ... Hyperparameters. We set number of iterations T = 20, lookahead degree k = 8, and number of generated samples Nsyn = 50000. For RANDOM API and VARIATION API, we use DDIM sampler with 100 steps.