reproducibilityindex.ai

NeuralFDR: Learning Discovery Thresholds from Hypothesis Features

Authors: Fei Xia, Martin J. Zhang, James Y. Zou, David Tse

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide extensive simulation on synthetic and real datasets to demonstrate that our algorithm makes more discoveries while controlling FDR compared to state-of-the-art methods. We evaluate our method using both simulated data and two real-world datasets.
Researcher Affiliation	Academia	Fei Xia , Martin J. Zhang , James Zou , David Tse Stanford University {feixia,jinye,jamesz,dntse}@stanford.edu
Pseudocode	Yes	Algorithm 1 Neural FDR
Open Source Code	Yes	We released the software at https://github.com/fxia22/Neural FDR
Open Datasets	Yes	We evaluate our method using both simulated data and two real-world datasets3. The implementation details are in Supp. Sec. 2. We compare Neural FDR with three other methods: BH procedure (BH) [3], Storey s BH procedure (SBH) with threshold λ = 0.4 [21], and Independent Hypothesis Weighting (IHW) with number of bins and folds set as default [15]. We ﬁrst consider Data IHW, the simulated data in the IHW paper ( Supp. 7.2.2 [15]). Airway data [11] is a RNA-Seq dataset... The GTEx [6] study is to quantify expression quantitative trait loci (e QTLs) in human tissues.
Dataset Splits	Yes	Third, we use cross validation to address the overﬁtting problem introduced by optimization. To be more speciﬁc, we divide the data into M folds. For fold j, the decision rule tj(x; ), before applied on fold j, is trained and cross validated on the rest of the data. The cross validation is done by rescaling the learned threshold tj(x) by a factor γj so that the corresponding mirror estimate \ FDP on the CV set is . Algorithm 1 Neural FDR: 1: Randomly divide the data {(Pi, Xi)}n i=1 into M folds. 3: Let the testing data be fold j, the CV data be fold j0 6= j, and the training data be the rest.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions 'DESeq2 [20]' but does not specify its version number. It discusses neural networks but does not list specific deep learning libraries (e.g., TensorFlow, PyTorch) with version numbers.
Experiment Setup	Yes	In all experiments, k is set to 20 and the group index is provided to IHW as the hypothesis feature. Other than the FDR control experiment, we set the nominal FDR level = 0.1. First, to have non-vanishing gradients, the indicator functions in (3) are substituted by sigmoid functions with the intensity parameters automatically chosen based on the dataset. Second, the training process of the neural network may be unstable if we use random initialization. Hence, we use an initialization method called the k-cluster initialization: 1) use k-means clustering to divide the data into k clusters based on the features; 2) compute the optimal threshold for each cluster based on the optimal group threshold condition ((7) in Sec. 5); 3) initialize the neural network by training it to ﬁt a smoothed version of the computed thresholds.