Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differentially Private Synthetic Data via Foundation Model APIs 1: Images

Authors: Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Harsha Nori, Sergey Yekhanin

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct the first exploration of the potential and limits of this vision on DP synthetic images. Surprisingly, we show that not only is such a vision realizable, but that it also has the potential to match or improve SOTA training-based DP synthetic image algorithms despite more restrictive model access. Our contributions are: (3) Experimental results ( 5).
Researcher Affiliation Industry Zinan Lin Microsoft Research EMAIL Sivakanth Gopi Microsoft Research EMAIL Janardhan Kulkarni Microsoft Research EMAIL Harsha Nori Microsoft Research EMAIL Sergey Yekhanin Microsoft Research EMAIL
Pseudocode Yes Algorithm 1: Private Evolution (PE) ... Algorithm 2: DP Nearest Neighbors Histogram (DP NN HISTOGRAM) ... Alg. 3 shows the full algorithm. ... Algorithm 4: Private Evolution (PE) for both labeled and unlabeled data.
Open Source Code Yes The code and data are released at https://github.com/microsoft/DPSDA.
Open Datasets Yes For example, on CIFAR10 (with Image Net as the public data)... We treat CIFAR10 (Krizhevsky et al., 2009) as private data. We use Camelyon17 dataset (Bandi et al., 2018; Koh et al., 2021) as private data... The code and data are released at https://github.com/microsoft/DPSDA.
Dataset Splits No The paper mentions training a downstream classifier on generated samples and testing on the CIFAR10 test set, but it does not specify explicit training/validation/test splits for its datasets or downstream tasks within the paper.
Hardware Specification Yes To ensure a fair comparison, we estimate the runtime of both algorithms using 1 NVIDIA V100 32GB GPU.
Software Dependencies No The paper mentions using "Opacus library (Yousefpour et al., 2021)" but does not provide specific version numbers for this or any other software components (e.g., PyTorch, Python).
Experiment Setup Yes Detailed hyper-parameter settings and more results such as generated samples and their nearest images in the private dataset are in Apps. J to L. ... Hyperparameters. We set number of iterations T = 20, lookahead degree k = 8, and number of generated samples Nsyn = 50000. For RANDOM API and VARIATION API, we use DDIM sampler with 100 steps.