DeepCoDA: personalized interpretability for compositional health data

Authors: Thomas Quinn, Dang Nguyen, Santu Rana, Sunil Gupta, Svetha Venkatesh

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our architecture maintains state-of-the-art performance across 25 real-world data sets, all while producing interpretations that are both personalized and fully coherent for compositional data.
Researcher Affiliation Academia 1Applied Artificial Intelligence Institute (A2I2), Deakin University, Geelong, Australia.
Pseudocode No The paper describes the network architecture and modules using equations and descriptive text, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our implementation of Deep Co DA is available from http: //github.com/nphdang/Deep Co DA.
Open Datasets Yes The first contains 13 data sets from (Quinn and Erb, 2020), curated to benchmark compositional data analysis methods for microbiome and similar data1. The second contains 12 data sets from (Vangay et al., 2019), curated to benchmark machine learning methods for microbiome data2. 1 Available from https://zenodo.org/record/3378099/ 2 Available from https://knights-lab.github.io/MLRepo/
Dataset Splits Yes We develop the model in two stages. First, we use a discovery set of 13 data sets to design the architecture and choose its hyper-parameters. Second, we use a verification set of 12 unseen data sets to benchmark the final model.
Hardware Specification No The paper does not provide specific hardware details like GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies No The paper refers to Python notebooks and uses various statistical and machine learning methods, but does not specify version numbers for any software dependencies or libraries.
Experiment Setup Yes Using a discovery set of 13 data sets, we trained models with B = [1, 3, 5, 10] log-bottlenecks and a λs = [0.001, 0.01, 0.1, 1] L1 penalty. Figure 3 shows the standardized performance for all discovery set models for each hyper-parameter combination. Here, we see that B = 5 and λs = 0.01 works well with or without self-explanation.