Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adaptive False Discovery Rate Control with Privacy Guarantee

Authors: Xintao Xia, Zhanrui Cai

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical studies demonstrate that the proposed DP-Ada PT performs better compared to the existing differentially private FDR control methods. Compared to the non-private Ada PT, it incurs a small accuracy loss but significantly reduces the computation cost. [...] In this section, we numerically evaluate the performance of the proposed DP-Ada PT in terms of false discovery rate and power. We compare with three other methods: the original Ada PT without privacy guarantee as proposed by Lei and Fithian (2018), the differentially private Benjamini Hochberg procedure ( DP-BH ) proposed by Dwork et al. (2021), and the private Bonferroni s method ( DP-Bonf ) as discussed in Dwork et al. (2021).
Researcher Affiliation Academia Xintao Xia EMAIL Department of Statistics Iowa State University Ames, IA 50011-1090, USA. Zhanrui Cai EMAIL Faculty of Business and Economics The University of Hong Kong Hong Kong, China.
Pseudocode Yes Algorithm 1 Ada PT (Lei and Fithian, 2018) [...] Algorithm 2 The Report Noisy Min Algorithm [...] Algorithm 3 The Mirror Peeling Algorithm [...] Algorithm 4 The DP-Ada PT Algorithm [...] Algorithm 5 The EM Algorithm [...] Algorithm 6 The Private BHq procedure (Dwork et al., 2021)
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It only mentions the license for the paper itself, but no link or statement about code for their implementation.
Open Datasets Yes The Bottomly data set is an RNA-seq data set collected by Bottomly et al. (2011) to detect differential striatal gene expression between the C57BL/6J (B6) and DBA/2J (D2) inbred mouse strains.
Dataset Splits No The paper mentions generating p-values for simulations and describes the size of the Bottomly dataset (n = 13932 genes), but it does not specify explicit training/test/validation splits with percentages, counts, or predefined citations for the experiments conducted.
Hardware Specification Yes The computation time is based on an HPC cluster with CPU Model Intel Xeon Gold 6152 and RAM 10 GB.
Software Dependencies No For the Ada PT procedure, we follow a similar algorithm as in Lei and Fithian (2018) and fit two-dimensional Generalized Additive Models in Mstep, using R package mgcv with the knots selected automatically in every step by GCV criterion. The version number for the 'mgcv' package is not specified.
Experiment Setup Yes Specifically, we set the total number of hypotheses to be n = 100 000, with the number of true effects t = 100. We select m = 500 in the peeling step. Let pi = Φ(ξi β) for i = 1, . . . , t, where Φ( ) is the CDF of standard normal distribution and ξ1, . . . , ξm are i.i.d. standard normal distribution. We set the signal β to be 4 and the significance level α = 0.1. Other parameters are required for the DP-BH algorithm, which is summarized in Algorithm 6. Two parameters are used to control the sensitivity of the p-values in the DP-BH algorithm: the multiplicative sensitivity η and the truncation threshold ν. We set η as 0,000 1 and ν = 0.5α/n, which are the same as Dwork et al. (2021). The privacy parameters are also set to be the same as in Dwork et al. (2021): ϵ = 0.5 and δ = 0,001. For our proposed DP-Ada PT procedure, we set the privacy parameter µ = 4ϵ/ p 10 log(1/δ).