Privacy without Noisy Gradients: Slicing Mechanism for Generative Model Training

Authors: Kristjan Greenewald, Yuancheng Yu, Hao Wang, Kai Xu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive numerical experiments demonstrate that our approach can generate synthetic data of higher quality compared with baselines.
Researcher Affiliation Collaboration Kristjan Greenewald MIT-IBM Watson AI Lab, IBM Research kristjan.h.greenewald@ibm.com Yuancheng Yu UIUC yyu51@illinois.edu Hao Wang MIT-IBM Watson AI Lab, IBM Research hao@ibm.com Kai Xu MIT-IBM Watson AI Lab, IBM Research xuk@ibm.com
Pseudocode Yes Algorithm 1 Training DP generative modes with the smoothed-sliced f-divergence.
Open Source Code No The paper does not provide a direct link to a source code repository or an explicit statement about releasing the code for the work described in this paper.
Open Datasets Yes We validate both our method and baselines using the US Census data derived from the American Community Survey (ACS) Public Use Microdata Sample (PUMS). Using the API of the Folktables package [DHMS21], we access the 2018 California data.
Dataset Splits No The paper mentions training and testing data, and subsampling for privacy amplification, but does not explicitly provide details for a validation split or its proportion.
Hardware Specification Yes For our method and baselines, each model was trained using a V100 GPU, with runtimes typically less than 2 hours for our method (200 epochs).
Software Dependencies No The paper mentions using 'open-source Python library [Sma23]' and 'Folktables package [DHMS21]' but does not provide specific version numbers for Python, PyTorch, or other key software dependencies.
Experiment Setup Yes For our method and Slice Wass, all experiments used batch size of 128 and learning rate 2 × 10−5, and ran for 200 epochs.