Attack-Aware Noise Calibration for Differential Privacy

Authors: Bogdan Kulynych, Juan Gomez, Georgios Kaissis, Flavio Calmon, Carmela Troncoso

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that calibrating noise to attack sensitivity/specificity, rather than ε, when training privacy-preserving ML models substantially improves model accuracy for the same risk level. Our work provides a principled and practical way to improve the utility of privacy-preserving ML without compromising on privacy. [...] 4 Experiments In this section, we empirically evaluate the utility improvement of our calibration method over traditional approaches.
Researcher Affiliation Academia Bogdan Kulynych Lausanne University Hospital (CHUV) Juan Felipe Gomez Harvard University Georgios Kaissis Technical University Munich Flavio du Pin Calmon Harvard University Carmela Troncoso EPFL
Pseudocode Yes Algorithm 1 Construct the trade-off curve using discrete privacy loss random variables (X, Y )
Open Source Code Yes We provide a Python package which implements our algorithms for analyzing DP mechanisms in terms of the interpretable f-DP guarantees, and calibrating to operational risks: github.com/Felipe-Gomez/riskcal
Open Datasets Yes GPT-2 on SST-2 (text sentiment classification)... CNN on CIFAR-10 (image classification)... (Figure 1) [...] ADULT dataset (Becker and Kohavi, 1996) comprising a small set of US Census data.
Dataset Splits Yes We use the default training split of the SST-2 dataset containing 67,348 examples for finetuning, and the default validation split containing 872 examples as a test set.
Hardware Specification Yes We use a commodity machine with AMD Ryzen 5 2600 six-core CPU, 16GB of RAM, and an Nvidia Ge Force RTX 4070 GPU with 16GB of VRAM to run our experiments.
Software Dependencies No The paper lists software such as "Py Torch (Paszke et al., 2019)", "numpy (Harris et al., 2020)", and "scipy (Virtanen et al., 2020)", but these citations refer to the papers describing the software rather than specifying the exact version numbers of the software used for the experiments.
Experiment Setup Yes For sentiment classification, we fine-tune GPT-2 (small) (Radford et al., 2019) using Lo RA (Hu et al., 2021) with DP-SGD on the SST-2 sentiment classification task (Socher et al., 2013)... We use the Poisson subsampling probability p 0.004 corresponding to expected batch size of 256, gradient clipping norm of 2 = 1.0, and finetune for three epochs with Lo RA of dimension 4 and scaling factor of 32. We vary the noise multiplier σ {0.5715, 0.6072, 0.6366, 0.6945, 0.7498} approximately corresponding to ε {3.95, 3.2, 2.7, 1.9, 1.45}, respectively, at δ = 10 5.