Attack-Aware Noise Calibration for Differential Privacy
Authors: Bogdan Kulynych, Juan Gomez, Georgios Kaissis, Flavio Calmon, Carmela Troncoso
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that calibrating noise to attack sensitivity/specificity, rather than ε, when training privacy-preserving ML models substantially improves model accuracy for the same risk level. Our work provides a principled and practical way to improve the utility of privacy-preserving ML without compromising on privacy. [...] 4 Experiments In this section, we empirically evaluate the utility improvement of our calibration method over traditional approaches. |
| Researcher Affiliation | Academia | Bogdan Kulynych Lausanne University Hospital (CHUV) Juan Felipe Gomez Harvard University Georgios Kaissis Technical University Munich Flavio du Pin Calmon Harvard University Carmela Troncoso EPFL |
| Pseudocode | Yes | Algorithm 1 Construct the trade-off curve using discrete privacy loss random variables (X, Y ) |
| Open Source Code | Yes | We provide a Python package which implements our algorithms for analyzing DP mechanisms in terms of the interpretable f-DP guarantees, and calibrating to operational risks: github.com/Felipe-Gomez/riskcal |
| Open Datasets | Yes | GPT-2 on SST-2 (text sentiment classification)... CNN on CIFAR-10 (image classification)... (Figure 1) [...] ADULT dataset (Becker and Kohavi, 1996) comprising a small set of US Census data. |
| Dataset Splits | Yes | We use the default training split of the SST-2 dataset containing 67,348 examples for finetuning, and the default validation split containing 872 examples as a test set. |
| Hardware Specification | Yes | We use a commodity machine with AMD Ryzen 5 2600 six-core CPU, 16GB of RAM, and an Nvidia Ge Force RTX 4070 GPU with 16GB of VRAM to run our experiments. |
| Software Dependencies | No | The paper lists software such as "Py Torch (Paszke et al., 2019)", "numpy (Harris et al., 2020)", and "scipy (Virtanen et al., 2020)", but these citations refer to the papers describing the software rather than specifying the exact version numbers of the software used for the experiments. |
| Experiment Setup | Yes | For sentiment classification, we fine-tune GPT-2 (small) (Radford et al., 2019) using Lo RA (Hu et al., 2021) with DP-SGD on the SST-2 sentiment classification task (Socher et al., 2013)... We use the Poisson subsampling probability p 0.004 corresponding to expected batch size of 256, gradient clipping norm of 2 = 1.0, and finetune for three epochs with Lo RA of dimension 4 and scaling factor of 32. We vary the noise multiplier σ {0.5715, 0.6072, 0.6366, 0.6945, 0.7498} approximately corresponding to ε {3.95, 3.2, 2.7, 1.9, 1.45}, respectively, at δ = 10 5. |