Optimal Unbiased Randomizers for Regression with Label Differential Privacy
Authors: Ashwinkumar Badanidiyuru Varadaraja, Badih Ghazi, Pritish Kamath, Ravi Kumar, Ethan Leeman, Pasin Manurangsi, Avinash V Varadarajan, Chiyuan Zhang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that these randomizers achieve state-of-the-art privacy-utility trade-offs on several datasets, highlighting the importance of reducing bias when training neural networks with label DP. |
| Researcher Affiliation | Industry | Ashwinkumar Badanidiyuru Google Mountain View, CA Badih Ghazi Google Research Mountain View, CA Pritish Kamath Google Research Mountain View, CA Ravi Kumar Google Research Mountain View, CA Ethan Leeman Google Cambridge, MA Pasin Manurangsi Google Research Bangkok, Thailand Avinash V Varadarajan Google Mountain View, CA Chiyuan Zhang Google Research Mountain View, CA |
| Pseudocode | Yes | Algorithm 1 Compute Opt Unbiased Randε Parameters: Privacy parameter ε 0. Input: P = (py)y Y prior over input labels Y, ˆY = (ˆyi)i I a finite sequence of potential output labels. Output: An ε-DP label randomizer. Solve the following LP in variables M = (My i)y Y,i I: ˆy ˆ Y My i g(ˆyi, y) , subject to [Non-negativity] y Y, i I : My i 0 [Normalization] y Y : P i I My i = 1 [ε-Label DP] i I, y, y Y s.t. y = y : My i eε My i [Unbiasedness] y Y : P i I My i ˆyi = y return Label randomizer M mapping Y to ˆY given by Pr[M(y) = ˆyi] = My i. |
| Open Source Code | No | The paper does not provide any explicit statements about making the source code available or links to a code repository for the described methodology. |
| Open Datasets | Yes | The Criteo Sponsored Search Conversion Log Dataset [TY18] is a collection of 15, 995, 634 data points derived from a sample of 90-day logs of live traffic from Criteo Predictive Search (CPS). The 1940 US Census dataset3 is widely used in the evaluation of data analysis with DP [WDZ+19, CJG21, GKM+21]. This digitally released dataset in 2012 contains 131, 903, 909 examples. |
| Dataset Splits | Yes | The training was performed on a random 80% of the dataset using RMSProp algorithm using the squared loss objective, with learning rate of 10 4, ℓ2-regularization of 10 4, batch size of 1, 024 for 50 epochs. The remaining 20% of the dataset was used to report the test loss. |
| Hardware Specification | Yes | All our experiments were performed using NVidia P100 GPUs. |
| Software Dependencies | No | The paper mentions software components like RMSProp algorithm but does not provide specific version numbers for any software dependencies or libraries used in their experiments. |
| Experiment Setup | Yes | The training was performed on a random 80% of the dataset using RMSProp algorithm using the squared loss objective, with learning rate of 10 4, ℓ2-regularization of 10 4, batch size of 1, 024 for 50 epochs. |