A Kernel-based Test of Independence for Cluster-correlated Data

Authors: Hongjiao Liu, Anna Plantinga, Yunhua Xiang, Michael Wu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Based on both simulation studies and real data analysis, we show that, with clustered data, our approach effectively controls type I error and has a higher statistical power than competing methods.
Researcher Affiliation Academia Hongjiao Liu Department of Biostatistics University of Washington liuhj@uw.edu Anna M. Plantinga Department of Mathematics and Statistics Williams College amp9@williams.edu Yunhua Xiang Department of Biostatistics University of Washington xiangyh@uw.edu Michael C. Wu Public Health Sciences Division Fred Hutchinson Cancer Research Center mcwu@fredhutch.org
Pseudocode No The paper does not include any pseudocode or algorithm blocks.
Open Source Code Yes All of our codes are implemented in R, and are available at https://github.com/Liujiao92/HSICcl.
Open Datasets Yes Here we apply HSICcl and competing methods to test the dependence between the overall vaginal microbiome composition and different metabolic pathways, using data from the Menopause Strategies: Finding Lasting Answers for Symptoms and Health (Ms FLASH) Vaginal Health Trial [27].
Dataset Splits No The paper does not specify explicit training, validation, or test dataset splits. It mentions using m clusters and d time points for simulations and real data analysis, but no partitioning for model training/validation/testing.
Hardware Specification No The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper states 'All of our codes are implemented in R' but does not specify the version of R or any specific R libraries with their version numbers.
Experiment Setup Yes For both X and Y , we consider two different kernels: the Gaussian kernel k X(z1, z2) = k Y (z1, z2) = exp( z1 z2 2 2/τ) and the linear kernel k X(z1, z2) = k Y (z1, z2) = z T 1 z2. For the Gaussian kernel, the shape parameter τ is chosen as the median of the Euclidean distance between each sample pair.