Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Privacy Budget Tailoring in Private Data Analysis
Authors: Daniel Alabi, Chris Wiggins
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the proposed, group-aware budget allocation, method on synthetic and real-world datasets where we show significant reductions in prediction error for the smallest groups, while still preserving sufficient privacy to protect the minority group from re-identification attacks. |
| Researcher Affiliation | Academia | Daniel Alabi EMAIL Columbia University Chris Wiggins EMAIL Columbia University |
| Pseudocode | Yes | Algorithm 1 Stage 1: ฯ-z CDP Standard Errors Computation for All Groups in Linear Regression... Algorithm 2 Stage 2: ยต-z CDP Gradient Descent for Langrangian Fair Regression |
| Open Source Code | Yes | We include a copy of a Numpy implementation of the algorithms that we present in the main paper along with the a README and a requirements file. |
| Open Datasets | Yes | The second real-world data set that we consider consists of demographic variables for law school students derived from the Longitudinal Bar Passage Study (Wightman, 1998). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It mentions subsampling for group definitions (e.g., "We subsampled the dataset into two: 11,480 males and 2,937 females") and discusses results across different numbers of groups, but not standard model training splits. |
| Hardware Specification | Yes | All our experiments are run on a Mac Book Pro (13-inch, 2018) with a 2.3GHz Quad-Core Intel Core i5 with 16GB Memory. |
| Software Dependencies | No | The paper mentions a "Numpy implementation" and a "requirements file" in the supplementary material, but does not provide specific version numbers for Numpy or other software dependencies within the main text. |
| Experiment Setup | Yes | For all our experiments, the total privacy budget is ฯ = ฯ + ยต. We allocate 20% of the budget to stage 1 (i.e., ฯ = 0.2ฯ) and 80% to stage 2 (i.e., ยต = 0.8ฯ)... In all the synthetic data experiments below, the clipping bound (for the gradients and losses) is set to = 2. We vary the privacy parameter from ฯ = 0.12 to ฯ = 102/2 and our Monte Carlo results are averaged over 1000 trials. ฯe is the noise in the dependent variable for both groups. We set ฯe = 1. |