Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Differentially Private Synthetic Data via Foundation Model APIs 1: Images
Authors: Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Harsha Nori, Sergey Yekhanin
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct the first exploration of the potential and limits of this vision on DP synthetic images. Surprisingly, we show that not only is such a vision realizable, but that it also has the potential to match or improve SOTA training-based DP synthetic image algorithms despite more restrictive model access. Our contributions are: (3) Experimental results ( 5). |
| Researcher Affiliation | Industry | Zinan Lin Microsoft Research EMAIL Sivakanth Gopi Microsoft Research EMAIL Janardhan Kulkarni Microsoft Research EMAIL Harsha Nori Microsoft Research EMAIL Sergey Yekhanin Microsoft Research EMAIL |
| Pseudocode | Yes | Algorithm 1: Private Evolution (PE) ... Algorithm 2: DP Nearest Neighbors Histogram (DP NN HISTOGRAM) ... Alg. 3 shows the full algorithm. ... Algorithm 4: Private Evolution (PE) for both labeled and unlabeled data. |
| Open Source Code | Yes | The code and data are released at https://github.com/microsoft/DPSDA. |
| Open Datasets | Yes | For example, on CIFAR10 (with Image Net as the public data)... We treat CIFAR10 (Krizhevsky et al., 2009) as private data. We use Camelyon17 dataset (Bandi et al., 2018; Koh et al., 2021) as private data... The code and data are released at https://github.com/microsoft/DPSDA. |
| Dataset Splits | No | The paper mentions training a downstream classifier on generated samples and testing on the CIFAR10 test set, but it does not specify explicit training/validation/test splits for its datasets or downstream tasks within the paper. |
| Hardware Specification | Yes | To ensure a fair comparison, we estimate the runtime of both algorithms using 1 NVIDIA V100 32GB GPU. |
| Software Dependencies | No | The paper mentions using "Opacus library (Yousefpour et al., 2021)" but does not provide specific version numbers for this or any other software components (e.g., PyTorch, Python). |
| Experiment Setup | Yes | Detailed hyper-parameter settings and more results such as generated samples and their nearest images in the private dataset are in Apps. J to L. ... Hyperparameters. We set number of iterations T = 20, lookahead degree k = 8, and number of generated samples Nsyn = 50000. For RANDOM API and VARIATION API, we use DDIM sampler with 100 steps. |