reproducibilityindex.ai

One-shot Empirical Privacy Estimation for Federated Learning

Authors: Galen Andrew, Peter Kairouz, Sewoong Oh, Alina Oprea, Hugh Brendan McMahan, Vinith Menon Suriyakumar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we present a novel one-shot approach that can systematically address these challenges, allowing efﬁcient auditing or estimation of the privacy loss of a model during the same, single training run used to ﬁt model parameters, and without requiring any a priori knowledge about the model architecture, task, or DP training algorithm. We show that our method provides provably correct estimates for the privacy loss under the Gaussian mechanism, and we demonstrate its performance on well-established FL benchmark datasets under several adversarial threat models.
Researcher Affiliation	Collaboration	Google Northeastern University MIT
Pseudocode	Yes	Algorithm 1 One-shot privacy estimation for Gaussian mechanism. Algorithm 2 Privacy estimation via random canaries Algorithm 3 Privacy estimation via random canaries using all iterates
Open Source Code	Yes	Code to reproduce experiments is available at https://github.com/google-research/federated/tree/master/ one_shot_epe.
Open Datasets	Yes	In this section we present the results of experiments estimating the privacy leakage while training a model on a large-scale public federated learning dataset: the stackoverﬂow word prediction data/model of Reddi et al. (2020). We present experimental results on the image dataset EMNIST in Appendix F.
Dataset Splits	No	The paper describes the training process and client participation (e.g., "train the model for 2048 rounds with 167 clients per round"), but it does not specify explicit train/validation/test data splits (e.g., percentages or absolute counts) for the datasets used.
Hardware Specification	No	The paper mentions general concepts like "client devices" but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library names with specific version numbers like PyTorch 1.9 or TensorFlow 2.x).
Experiment Setup	Yes	We train the model for 2048 rounds with 167 clients per round, where each of the m=341k clients participates in exactly one round, amounting to a single epoch over the data. We use the adaptive clipping method of Andrew et al. (2021). With preliminary manual tuning, we selected a client learning rate of 1.0, server learning rate of 0.56, and momentum of 0.9 on the server for all experiments because this choice gives good performance over a range of levels of DP noise. We use 1k canaries for each set of cosines; experiments with intermediate iterates use 1k observed and 1k unobserved canaries. We ﬁx δ = m 1.1. We consider noise multipliers in the range 0.0496 to 0.2317, corresponding to analytical ε estimates from 300 down to 30.