Finding significant combinations of features in the presence of categorical covariates

Authors: Laetitia Papaxanthos, Felipe Llinares-López, Dean Bodenham, Karsten Borgwardt

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental FACS demonstrates superior speed and statistical power on simulated and real-world datasets compared to the state of the art, opening the door to numerous applications in biomedicine.
Researcher Affiliation Academia Machine Learning and Computational Biology Lab D-BSSE, ETH Zurich
Pseudocode Yes Algorithm 1 FACS Algorithm 2 tarone_cmh
Open Source Code Yes code for FACS is available on Git Hub2. 2https://github.com/Borgwardt Lab/FACS
Open Datasets Yes A. thaliana GWAS: We apply FACS, LAMP-χ2 and Bonf-CMH to two datasets from the plant model organism A. thaliana [1]... The breast cancer data set, as used in [15]
Dataset Splits No The paper describes generating synthetic datasets and using real-world datasets, but it does not explicitly provide details about training, validation, and test splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not provide any specific details regarding the hardware used to run the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies used in the experiments.
Experiment Setup Yes We generated synthetic datasets with one truly associated feature subset Strue and one confounded feature subset Sconf to evaluate precision and ability to correct for confounders... We set ρtrue = ρconf = ρ... contain 84 and 95 samples, respectively... Each plant sample is represented by a sequence of approximately 214, 000 genetic bases... we downsampled each of the five chromosomes... by a factor of 20, using 20 different offsets... containing between 1, 423 and 2, 661 features... For both datasets we condition on the ancestry, resulting in k = 5 and k = 3 categories for the covariate... includes 12, 773 genes classified into up-regulated or not up-regulated. Each gene is represented by 397 binary features... Two sets of experiments were conducted, conditioning on 8 and 16 categories respectively.