reproducibilityindex.ai

Privacy Auditing with One (1) Training Run

Authors: Thomas Steinke, Milad Nasr, Matthew Jagielski

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results: As an application of our new auditing framework, we audit DP-SGD training on a Wide Res Net model, trained on the CIFAR10 dataset across multiple conﬁgurations. Our approach successfully achieves an empirical lower bound of ε 1.8, compared to a theoretical upper bound of ε 4 in the white-box setting. The m examples we insert for auditing (known in the literature as canaries ) do not signiﬁcantly impact the accuracy of the ﬁnal model (less than a 5% decrease in accuracy) and our procedure only requires a single end-to-end training run. Such results were previously unattainable in the setting where only one model could be trained. Experiments
Researcher Affiliation	Industry	Thomas Steinke Google DeepMind steinke@google.com Milad Nasr Google DeepMind srxzr@google.com Matthew Jagielski Google DeepMind jagielski@google.com
Pseudocode	Yes	Algorithm 1 Auditor with One Training Run 1: Data: x X n consisting of m auditing examples (a.k.a. canaries) x1, , xm and n m non-auditing examples xm+1, , xn. 2: Parameters: Algorithm to audit A, number of examples to randomize m, number of positive k+ and negative k guesses. 3: For i [m], sample Si { 1, +1} uniformly and independently. Set Si = 1 for all i [n] \ [m]. 4: Partition x into x IN X n IN and x OUT X n OUT according to S, where n IN +n OUT = n. Namely, if Si = 1, then xi is in x IN; and, if Si = 1, then xi is in x OUT. 5: Run A on input x IN with appropriate parameters, outputting w. 6: Compute the vector of scores Y = (SCORE(xi, w) : i [m]) Rm. 7: Sort the scores Y . Let T { 1, 0, +1}m be +1 for the largest k+ scores and 1 for the smallest k scores. (I.e., T { 1, 0, +1}m maximizes Pm i Ti Yi subject to Pm i \|Ti\| = k+ + k and Pm i Ti = k+ k .) 8: Return: S { 1, +1}m indicating the true selection and the guesses T { 1, 0, +1}m.
Open Source Code	No	No explicit statement about providing open-source code for the described methodology or a link to a code repository was found.
Open Datasets	Yes	We run DP-SGD on the CIFAR-10 dataset with Wide Res Net (WRN-16) [ZK16], following the experimental setup of Nasr et al. [NHSBTJCT23].
Dataset Splits	No	We run DP-SGD on the CIFAR-10 dataset with Wide Res Net (WRN-16) [ZK16], following the experimental setup of Nasr et al. [NHSBTJCT23]. ... We used m = 5000 and all of the training dataset from CIFAR10 (n = 50, 000) for the attack. (While the paper mentions using the CIFAR-10 dataset, it does not explicitly state the training, validation, and test splits used for their experiments, only that they used 'all of the training dataset'.)
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., libraries, frameworks) were mentioned in the paper.
Experiment Setup	No	The paper states 'We run DP-SGD on the CIFAR-10 dataset with Wide Res Net (WRN-16) [ZK16], following the experimental setup of Nasr et al. [NHSBTJCT23].' It references an external paper for the experimental setup rather than providing the specific hyperparameter values or training configurations within its own text.