Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies

Authors: Zhanyu Wang, Guang Cheng, Jordan Awan

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our simulations show the advantage of our deconvolution method in terms of the coverage of the CIs compared to Du et al. (2020), and the CI width compared to Brawner and Honaker (2018). We also conduct numerical experiments on the 2016 Canada Census Public Use Microdata, which reveals the dependence between individuals income and shelter cost under DP guarantees by building CIs for the slope parameters of logistic regression and quantile regression.
Researcher Affiliation	Academia	Zhanyu Wang EMAIL Department of Statistics Purdue University West Lafayette, IN 47906, USA Guang Cheng EMAIL Department of Statistics University of California, Los Angeles Los Angeles, CA 90095, USA Jordan Awan EMAIL Department of Statistics Purdue University West Lafayette, IN 47906, USA and Department of Statistics University of Pittsburgh Pittsburgh, PA 15260, USA
Pseudocode	Yes	Algorithm 1 DP bootstrap estimates (with Gaussian mechanism) Algorithm 2 DP bootstrap Asymptotic CI Algorithm 3 DP bootstrap deconvolution sampling distribution and CI
Open Source Code	No	The paper does not explicitly provide a link to source code, a statement of code release, or indicate that code is provided in supplementary materials.
Open Datasets	Yes	We conduct experiments with the 2016 Census Public Use Microdata Files (PUMF), which provide data on the characteristics of the Canadian population (Canada, 2019). Statistics Canada. 2016 Census Public Use Microdata File (PUMF). Individuals File, 2019.
Dataset Splits	Yes	To evaluate the performance of diﬀerent statistical inference methods, we calculate the coverage and width of the CIs from 2000 simulations for each setting where the input data sets are sampled from the original data set with replacement with size n = 1000, 3000, 10000, 30000, 100000.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running its experiments.
Software Dependencies	No	Among diﬀerent deconvolution methods, we choose to use deconvolve R (Efron, 2016) since it performs the best in our settings without tuning its hyper-parameters. Using R, we have t1 = 0.2228743 . . ., t2 = . Then δ1 = FD2,1(t1) − eFD1,1(t1) where F is the CDF corresponding to f. Using R, we have δ1 = 0.4475773.
Experiment Setup	Yes	The privacy guarantee is set to be 1-GDP, the conﬁdence level is 90%, and we use B = 100 for bootstrap and DP bootstrap. We set the response yi = 1 if SHELCO ≤ 0.5, otherwise yi = −1. In logistic regression, the model is P(Y \|X) = 1/(1+exp(−θ⊤Y X)), and the empirical risk minimizer (ERM), also the maximum likelihood estimate of θ, is ˆθ = argminθR(θ) where R(θ) := 1/n Pni=1 log(P(yi\|xi)). ...the true parameter θ = (θ1, θ2) ∈ R2 as the regularized ERM estimated with the original data set under the same c. ...the results are shown in the upper ﬁgures of Figure 8b where c = 1. ...where we set c = 1 and τ = 0.5.