Privacy without Noisy Gradients: Slicing Mechanism for Generative Model Training
Authors: Kristjan Greenewald, Yuancheng Yu, Hao Wang, Kai Xu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive numerical experiments demonstrate that our approach can generate synthetic data of higher quality compared with baselines. |
| Researcher Affiliation | Collaboration | Kristjan Greenewald MIT-IBM Watson AI Lab, IBM Research kristjan.h.greenewald@ibm.com Yuancheng Yu UIUC yyu51@illinois.edu Hao Wang MIT-IBM Watson AI Lab, IBM Research hao@ibm.com Kai Xu MIT-IBM Watson AI Lab, IBM Research xuk@ibm.com |
| Pseudocode | Yes | Algorithm 1 Training DP generative modes with the smoothed-sliced f-divergence. |
| Open Source Code | No | The paper does not provide a direct link to a source code repository or an explicit statement about releasing the code for the work described in this paper. |
| Open Datasets | Yes | We validate both our method and baselines using the US Census data derived from the American Community Survey (ACS) Public Use Microdata Sample (PUMS). Using the API of the Folktables package [DHMS21], we access the 2018 California data. |
| Dataset Splits | No | The paper mentions training and testing data, and subsampling for privacy amplification, but does not explicitly provide details for a validation split or its proportion. |
| Hardware Specification | Yes | For our method and baselines, each model was trained using a V100 GPU, with runtimes typically less than 2 hours for our method (200 epochs). |
| Software Dependencies | No | The paper mentions using 'open-source Python library [Sma23]' and 'Folktables package [DHMS21]' but does not provide specific version numbers for Python, PyTorch, or other key software dependencies. |
| Experiment Setup | Yes | For our method and Slice Wass, all experiments used batch size of 128 and learning rate 2 × 10−5, and ran for 200 epochs. |