Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Privacy Budget Tailoring in Private Data Analysis
Authors: Daniel Alabi, Chris Wiggins
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the proposed, group-aware budget allocation, method on synthetic and real-world datasets where we show significant reductions in prediction error for the smallest groups, while still preserving sufficient privacy to protect the minority group from re-identification attacks. |
| Researcher Affiliation | Academia | Daniel Alabi EMAIL Columbia University Chris Wiggins EMAIL Columbia University |
| Pseudocode | Yes | Algorithm 1 Stage 1: ฯ-z CDP Standard Errors Computation for All Groups in Linear Regression... Algorithm 2 Stage 2: ยต-z CDP Gradient Descent for Langrangian Fair Regression |
| Open Source Code | Yes | We include a copy of a Numpy implementation of the algorithms that we present in the main paper along with the a README and a requirements file. |
| Open Datasets | Yes | The second real-world data set that we consider consists of demographic variables for law school students derived from the Longitudinal Bar Passage Study (Wightman, 1998). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It mentions subsampling for group definitions (e.g., "We subsampled the dataset into two: 11,480 males and 2,937 females") and discusses results across different numbers of groups, but not standard model training splits. |
| Hardware Specification | Yes | All our experiments are run on a Mac Book Pro (13-inch, 2018) with a 2.3GHz Quad-Core Intel Core i5 with 16GB Memory. |
| Software Dependencies | No | The paper mentions a "Numpy implementation" and a "requirements file" in the supplementary material, but does not provide specific version numbers for Numpy or other software dependencies within the main text. |
| Experiment Setup | Yes | For all our experiments, the total privacy budget is ฯ = ฯ + ยต. We allocate 20% of the budget to stage 1 (i.e., ฯ = 0.2ฯ) and 80% to stage 2 (i.e., ยต = 0.8ฯ)... In all the synthetic data experiments below, the clipping bound (for the gradients and losses) is set to = 2. We vary the privacy parameter from ฯ = 0.12 to ฯ = 102/2 and our Monte Carlo results are averaged over 1000 trials. ฯe is the noise in the dependent variable for both groups. We set ฯe = 1. |