reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Privacy Budget Tailoring in Private Data Analysis

Authors: Daniel Alabi, Chris Wiggins

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the proposed, group-aware budget allocation, method on synthetic and real-world datasets where we show significant reductions in prediction error for the smallest groups, while still preserving sufficient privacy to protect the minority group from re-identification attacks.
Researcher Affiliation	Academia	Daniel Alabi EMAIL Columbia University Chris Wiggins EMAIL Columbia University
Pseudocode	Yes	Algorithm 1 Stage 1: τ-z CDP Standard Errors Computation for All Groups in Linear Regression... Algorithm 2 Stage 2: µ-z CDP Gradient Descent for Langrangian Fair Regression
Open Source Code	Yes	We include a copy of a Numpy implementation of the algorithms that we present in the main paper along with the a README and a requirements file.
Open Datasets	Yes	The second real-world data set that we consider consists of demographic variables for law school students derived from the Longitudinal Bar Passage Study (Wightman, 1998).
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits. It mentions subsampling for group definitions (e.g., "We subsampled the dataset into two: 11,480 males and 2,937 females") and discusses results across different numbers of groups, but not standard model training splits.
Hardware Specification	Yes	All our experiments are run on a Mac Book Pro (13-inch, 2018) with a 2.3GHz Quad-Core Intel Core i5 with 16GB Memory.
Software Dependencies	No	The paper mentions a "Numpy implementation" and a "requirements file" in the supplementary material, but does not provide specific version numbers for Numpy or other software dependencies within the main text.
Experiment Setup	Yes	For all our experiments, the total privacy budget is ρ = τ + µ. We allocate 20% of the budget to stage 1 (i.e., τ = 0.2ρ) and 80% to stage 2 (i.e., µ = 0.8ρ)... In all the synthetic data experiments below, the clipping bound (for the gradients and losses) is set to = 2. We vary the privacy parameter from ρ = 0.12 to ρ = 102/2 and our Monte Carlo results are averaged over 1000 trials. σe is the noise in the dependent variable for both groups. We set σe = 1.