Finding significant combinations of features in the presence of categorical covariates
Authors: Laetitia Papaxanthos, Felipe Llinares-López, Dean Bodenham, Karsten Borgwardt
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | FACS demonstrates superior speed and statistical power on simulated and real-world datasets compared to the state of the art, opening the door to numerous applications in biomedicine. |
| Researcher Affiliation | Academia | Machine Learning and Computational Biology Lab D-BSSE, ETH Zurich |
| Pseudocode | Yes | Algorithm 1 FACS Algorithm 2 tarone_cmh |
| Open Source Code | Yes | code for FACS is available on Git Hub2. 2https://github.com/Borgwardt Lab/FACS |
| Open Datasets | Yes | A. thaliana GWAS: We apply FACS, LAMP-χ2 and Bonf-CMH to two datasets from the plant model organism A. thaliana [1]... The breast cancer data set, as used in [15] |
| Dataset Splits | No | The paper describes generating synthetic datasets and using real-world datasets, but it does not explicitly provide details about training, validation, and test splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments. |
| Experiment Setup | Yes | We generated synthetic datasets with one truly associated feature subset Strue and one confounded feature subset Sconf to evaluate precision and ability to correct for confounders... We set ρtrue = ρconf = ρ... contain 84 and 95 samples, respectively... Each plant sample is represented by a sequence of approximately 214, 000 genetic bases... we downsampled each of the five chromosomes... by a factor of 20, using 20 different offsets... containing between 1, 423 and 2, 661 features... For both datasets we condition on the ancestry, resulting in k = 5 and k = 3 categories for the covariate... includes 12, 773 genes classified into up-regulated or not up-regulated. Each gene is represented by 397 binary features... Two sets of experiments were conducted, conditioning on 8 and 16 categories respectively. |