Estimating the Number and Effect Sizes of Non-null Hypotheses
Authors: Jennifer Brennan, Ramya Korlakai Vinayak, Kevin Jamieson
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our estimator on both real and simulated data. We begin with the mixture of two Gaussians described by Eqn (5). Figure 3 shows the rate of convergence of our estimator for different values of γ , the alternate effect size. Note that the estimate never exceeds the true value ζ , and that it improves as n increases. The variance of our estimator, shown with bootstrapped 90% confidence intervals, can be large for small n but decreases as n increases. |
| Researcher Affiliation | Academia | 1Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA. Correspondence to: Jennifer Brennan <jrb@cs.washington.edu>. |
| Pseudocode | No | The paper describes the intuition and properties of its estimator and states it "can be implemented as an efficient convex program" and "solved using off-the-shelf software (see Appendix C for details)", but it does not provide a formal pseudocode block or algorithm steps within the main text or appendix. |
| Open Source Code | Yes | A Python implementation is available at https://github.com/jenniferbrennan/Counting Discoveries/. |
| Open Datasets | Yes | We evaluated our estimator on Z-scores from an experiment to identify which genes contribute to influenza replication in Drosophila, described by Hao et al. (2008). |
| Dataset Splits | No | The paper does not provide explicit details on dataset splits (e.g., specific percentages or counts for training, validation, or testing sets). For the real data, it mentions "two replicates" for the Drosophila genes but not how the data was partitioned for model development or evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU specifications, or cloud computing instance types. |
| Software Dependencies | No | The paper mentions a "Python implementation" and refers to "CVXPY" and "SCS" in Appendix C as tools used for convex optimization, but it does not specify version numbers for these or any other software dependencies. Therefore, it does not provide a reproducible description including specific version numbers. |
| Experiment Setup | Yes | For simulation experiments, it states: "After observing Xi N(µi, 1) for i = 1, . . . , n with n = 104" and "For a fixed value of n = 104, we are interested in the probability...". For real data, it notes: "The data... consisted of Z-scores from two replicates for each of 13,071 genes." and "We found that σ2 = 1/4 provided a good fit to the data; we used this value for the rest of our computations." These details describe the configuration for experiments. |