reproducibilityindex.ai

Privately Publishable Per-instance Privacy

Authors: Rachel Redberg, Yu-Xiang Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we evaluate our methods to release the p DP losses using logistic regression as a case study. In Section 4.1, we demonstrate that the stronger regularization required by Algorithm 2 does not affect the utility of the model. In Section 4.2 we show that by carefully allocating the privacy budget of the data-dependent release, we can achieve a more accurate estimate of the ex-post p DP losses of Algorithm 1 compared to the data-independent release, with reasonable overhead (same overall DP budget and only a slight uptick in the overall p DP losses).
Researcher Affiliation	Academia	Rachel Redberg Department of Computer Science UC Santa Barbara Santa Barbara, CA 93106 rredberg@ucsb.edu Yu-Xiang Wang Department of Computer Science UC Santa Barbara Santa Barbara, CA 93106 yuxiangw@cs.ucsb.edu
Pseudocode	Yes	Algorithm 1 Release ˆθP via Obj-Pert (Kifer et al., 2012) Input: Dataset D, noise parameter σ, regularization parameter λ, loss function L(θ; D) = P i ℓ(θ; zi), convex and twice-differentiable regularizer r(θ), convex set Θ. Output: ˆθP , the minimizer of the perturbed objective. Draw noise vector b N(0, σ2I). Compute ˆθP according to (1). Algorithm 2 Privacy report for Obj-Pert on GLMs Input: ˆθp Rd from Obj-Pert, noise parameter σ, σ2, σ3; regularization parameter λ; Hessian H := P i 2ℓ(ˆθp; zi) + λId, Boolean B [DATA-INDEP, DATA-DEP], failure probability ρ Require: λ 2σ3F 1 λ1(GOE(d))(1 ρ/2) Output: Reporting function ϵ : (x, y), δ R3 + if B = DATA-INDEP then Set ϵ2( ) := 0, ϵ3( ) := 0. Set g P (z) := σ\|\|f ( )x\|\|2F 1 N(0,1)(1 ρ/2) and set µp(x) := x 2 λ . else if B = DATA-DEP then Privately release ˆgp by Algorithm ?? with parameter σ2. Set ϵ2( ) according to Theorem ??. Set g P (z) := min n f ( )[ˆg P (z)]T x + σ2\|\|f ( )x\|\|2F 1 N (0,1)(1 ρ/2), σ\|\|f ( )x\|\|2F 1 N (0,1)(1 ρ/2) o . Privately release ˆHp by a variant of Analyze Gauss 2with parameter σ3. Set ϵ3( ) according to Statement 2 of Theorem 10. Set µp(x) = 3 2x T [ ˆHp] 1x. end if Set ϵp 1(z) := log 1 f ( )µp(x) + \|\|f ( )x\|\|2 2 2σ2 + σ2 . Output the function ϵ(z) := ϵp 1(z), ϵ2(z), ϵ3(z) .
Open Source Code	No	The paper does not provide any explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	The following experiments feature the credit card default dataset (n = 30000, d = 21) (Yeh & Lien, 2009) from the UCI Machine Learning Repository.
Dataset Splits	No	The paper mentions using a "synthetic dataset" and the "credit card default dataset" for experiments, and also refers to a "training set" implicitly when discussing calculations for "each z in the training set." However, it does not specify any explicit dataset split percentages (e.g., train/validation/test splits), absolute sample counts for each split, or reference to predefined standard splits with their characteristics. Therefore, the information is insufficient to reproduce data partitioning.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments. There are no mentions of specific GPU models, CPU models, or cloud computing instance types with their specifications.
Software Dependencies	No	The paper does not provide specific version numbers for any software components, libraries, or solvers used in the experiments. It lacks details such as 'Python 3.8' or 'PyTorch 1.9'.
Experiment Setup	Yes	For Figure 1, we use a synthetic dataset D sampled from the unit ball with two linearly separable classes separated by margin m = 0.4. Then we solve for ˆθ = argmin J(θ; D) with λ = 1 to minimize the logistic loss, and directly perturb the output by rotating it by angle ω [0, π4 , π]. We then denote ˆθP := θ+ω to mean θ rotated counter-clockwise by angle ω. ... In this experiment we use a synthetic dataset generated by sampling xi, θ N(0, Id) and normalizing each xi X so that \|\|xi\|\|2 = 1. Then we rescale Y = Xθ to ensure yi [0, 1] for each yi Y . ... The failure probabilities for both Algorithms 1 and 2 are set as δ = ρ = 10 6. Our choices of σ and λ depend on ϵ1 and follows the requirements stated in Theorem 5 to achieve DP. We don t use any additional regularization, i.e. r(θ) = 0. For the data-dependent release, the noise parameters σ2, σ3 are each calibrated according to the analytic Gaussian mechanism of (Balle & Wang, 2018). ... For the data-independent release we use the entire privacy budget on releasing ˆθP (ϵ1 = 1). For the data-dependent release we reserve some of the privacy budget for releasing µP ( ) and g P ( ) (ϵ1 = 0.2, ϵ2 = 0.7, ϵ3 = 0.1).