Accuracy First: Selecting a Differential Privacy Level for Accuracy Constrained ERM

Authors: Katrina Ligett, Seth Neel, Aaron Roth, Bo Waggoner, Steven Z. Wu

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 4, we empirically evaluate our noise reduction meta-method, which applies to any ERM technique which can be described as a post-processing of the Laplace mechanism. This includes both direct applications of the Laplace mechanism, like output perturbation [5]; and more sophisticated methods like covariance perturbation [19], which perturbs the covariance matrix of the data and then performs an optimization using the noisy data. Our experiments concentrate on ℓ2 regularized least-squares regression and ℓ2 regularized logistic regression, and we apply our noise reduction meta-method to both output perturbation and covariance perturbation. Our empirical results show that the active, ex-post privacy approach massively outperforms inverting the theory curve, and also improves on a baseline ε-doubling approach.
Researcher Affiliation Collaboration Katrina Ligett Caltech and Hebrew University Seth Neel University of Pennsylvania Aaron Roth University of Pennsylvania Bo Waggoner University of Pennsylvania Zhiwei Steven Wu Microsoft Research
Pseudocode Yes Algorithm 1 Noise Reduction: NR(v, , {εt}) Input: private vector v, sensitivity parameter , list ε1 < ε2 < < εT Set ˆv T := v + Lap ( /εT ) drawn i.i.d. for each coordinate for t = T 1, T 2, . . . , 1 do With probability εt εt+1 2 : set ˆvt := ˆvt+1 Else: set ˆvt := ˆvt+1 + Lap ( /εt) drawn i.i.d. for each coordinate Return ˆv1, . . . , ˆv T
Open Source Code Yes A full implementation of our algorithms appears at: https://github.com/steven7woo/ Accuracy-First-Differential-Privacy.
Open Datasets Yes We used ridge regression to predict (log) popularity of posts on Twitter in the dataset of [1]... Logistic regression was applied to classifying network events as innocent or malicious in the KDD-99 Cup dataset [13]
Dataset Splits No The paper mentions using the Twitter dataset and the KDD-99 Cup dataset but does not specify any explicit train/validation/test splits or how the data was partitioned for experiments.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions) that were used for the experiments.
Experiment Setup No The paper states 'Details of parameters and methods appear in the full version' for its experiments, but does not include specific hyperparameters (e.g., learning rate, batch size, optimizer settings) or detailed system-level training configurations in the main text.