reproducibilityindex.ai

Finding significant combinations of features in the presence of categorical covariates

Authors: Laetitia Papaxanthos, Felipe Llinares-López, Dean Bodenham, Karsten Borgwardt

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	FACS demonstrates superior speed and statistical power on simulated and real-world datasets compared to the state of the art, opening the door to numerous applications in biomedicine.
Researcher Affiliation	Academia	Machine Learning and Computational Biology Lab D-BSSE, ETH Zurich
Pseudocode	Yes	Algorithm 1 FACS Algorithm 2 tarone_cmh
Open Source Code	Yes	code for FACS is available on Git Hub2. 2https://github.com/Borgwardt Lab/FACS
Open Datasets	Yes	A. thaliana GWAS: We apply FACS, LAMP-χ2 and Bonf-CMH to two datasets from the plant model organism A. thaliana [1]... The breast cancer data set, as used in [15]
Dataset Splits	No	The paper describes generating synthetic datasets and using real-world datasets, but it does not explicitly provide details about training, validation, and test splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies used in the experiments.
Experiment Setup	Yes	We generated synthetic datasets with one truly associated feature subset Strue and one confounded feature subset Sconf to evaluate precision and ability to correct for confounders... We set ρtrue = ρconf = ρ... contain 84 and 95 samples, respectively... Each plant sample is represented by a sequence of approximately 214, 000 genetic bases... we downsampled each of the ﬁve chromosomes... by a factor of 20, using 20 different offsets... containing between 1, 423 and 2, 661 features... For both datasets we condition on the ancestry, resulting in k = 5 and k = 3 categories for the covariate... includes 12, 773 genes classiﬁed into up-regulated or not up-regulated. Each gene is represented by 397 binary features... Two sets of experiments were conducted, conditioning on 8 and 16 categories respectively.