A Powerful Global Test Statistic for Functional Statistical Inference

Authors: Jingwen Zhang, Joseph Ibrahim, Tengfei Li, Hongtu Zhu5765-5772

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use simulations to show that the proposed test outperforms all existing state-of-the-art methods in functional statistical inference. Finally, we apply the proposed testing method to the genome-wide association analysis of imaging genetic data in UK Biobank dataset.
Researcher Affiliation Academia 1Department of Biostatistics, UNC Gillings School of Global Public Health 2Biomedical Research Imaging Center, UNC School of Medicine University of North Carolina at Chapel Hill
Pseudocode Yes Algorithm 1. (a) Fit the varying coefficient model under the null hypothesis and get the estimate of bβ0(s) and {bηi,0(s), bei,0(s)}n i=1. (b) For g = 1, , G, generate independent random numbers ν(g) i and ν(g) i (sm) from N(0, 1), and the wild bootstrap sample on each grid point can be calculated as by(g) i (sm) = bβ0(sm)T xi+ν(g) i bηi,0(sm)+ν(g) i (sm)bei,0(sm). (c) Repeat the testing procedure and obtain G samples of Tn(bω(g) λ ) under the null hypothesis. (d) The p-value is approximated by p = G 1 P G g=1 I{Tn( bwλ) Tn(bω(g) λ )}.
Open Source Code No The paper does not provide an explicit statement or link for open-source code of the described methodology. A link is provided for the proof of a theorem, not the code.
Open Datasets Yes Finally, we apply the proposed testing method to the genome-wide association analysis of imaging genetic data in UK Biobank dataset.
Dataset Splits No The paper mentions total sample sizes and the number of simulation replicates, but it does not explicitly specify training, validation, or test dataset splits using percentages, sample counts, or references to predefined splits.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions the "FSL tool set (Jenkinson et al. 2012)" but does not provide specific version numbers for this or any other software dependencies, which are necessary for reproducibility.
Experiment Setup Yes We set n = 200 and S = 1 and put the number of grid points M = 100 evenly in [0, 1]. For the choice of the tuning parameter, we considered both fixed quantities where log λn takes values from [ 2, 0] with an equal increment of 0.1 (PFGT-λn) and an optimal bλn selected by (19) in each run (PFGT-optimal). In each scenario, 1,000 simulation replicates were generated to evaluate type I and type II error rates respectively. To calculate p-values, G = 1, 000 wild-bootstrap samples were generated in each run. We fitted model (1) with covariates including an intercept term, a specific SNP, age, gender, and the top 5 genetic principal components. For each MAF category, we generated 10,000 bootstrap samples and adopted a mixed chis-quare approximation (Zhang 2005) to approximate the null distribution of the test statistic.