Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies

Authors: Zhanyu Wang, Guang Cheng, Jordan Awan

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our simulations show the advantage of our deconvolution method in terms of the coverage of the CIs compared to Du et al. (2020), and the CI width compared to Brawner and Honaker (2018). We also conduct numerical experiments on the 2016 Canada Census Public Use Microdata, which reveals the dependence between individuals income and shelter cost under DP guarantees by building CIs for the slope parameters of logistic regression and quantile regression.
Researcher Affiliation Academia Zhanyu Wang EMAIL Department of Statistics Purdue University West Lafayette, IN 47906, USA Guang Cheng EMAIL Department of Statistics University of California, Los Angeles Los Angeles, CA 90095, USA Jordan Awan EMAIL Department of Statistics Purdue University West Lafayette, IN 47906, USA and Department of Statistics University of Pittsburgh Pittsburgh, PA 15260, USA
Pseudocode Yes Algorithm 1 DP bootstrap estimates (with Gaussian mechanism) Algorithm 2 DP bootstrap Asymptotic CI Algorithm 3 DP bootstrap deconvolution sampling distribution and CI
Open Source Code No The paper does not explicitly provide a link to source code, a statement of code release, or indicate that code is provided in supplementary materials.
Open Datasets Yes We conduct experiments with the 2016 Census Public Use Microdata Files (PUMF), which provide data on the characteristics of the Canadian population (Canada, 2019). Statistics Canada. 2016 Census Public Use Microdata File (PUMF). Individuals File, 2019.
Dataset Splits Yes To evaluate the performance of different statistical inference methods, we calculate the coverage and width of the CIs from 2000 simulations for each setting where the input data sets are sampled from the original data set with replacement with size n = 1000, 3000, 10000, 30000, 100000.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments.
Software Dependencies No Among different deconvolution methods, we choose to use deconvolve R (Efron, 2016) since it performs the best in our settings without tuning its hyper-parameters. Using R, we have t1 = 0.2228743 . . ., t2 = . Then δ1 = FD2,1(t1) − eFD1,1(t1) where F is the CDF corresponding to f. Using R, we have δ1 = 0.4475773.
Experiment Setup Yes The privacy guarantee is set to be 1-GDP, the confidence level is 90%, and we use B = 100 for bootstrap and DP bootstrap. We set the response yi = 1 if SHELCO ≤ 0.5, otherwise yi = −1. In logistic regression, the model is P(Y |X) = 1/(1+exp(−θ⊤Y X)), and the empirical risk minimizer (ERM), also the maximum likelihood estimate of θ, is ˆθ = argminθR(θ) where R(θ) := 1/n Pni=1 log(P(yi|xi)). ...the true parameter θ = (θ1, θ2) ∈ R2 as the regularized ERM estimated with the original data set under the same c. ...the results are shown in the upper figures of Figure 8b where c = 1. ...where we set c = 1 and τ = 0.5.