DeepCoDA: personalized interpretability for compositional health data
Authors: Thomas Quinn, Dang Nguyen, Santu Rana, Sunil Gupta, Svetha Venkatesh
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our architecture maintains state-of-the-art performance across 25 real-world data sets, all while producing interpretations that are both personalized and fully coherent for compositional data. |
| Researcher Affiliation | Academia | 1Applied Artificial Intelligence Institute (A2I2), Deakin University, Geelong, Australia. |
| Pseudocode | No | The paper describes the network architecture and modules using equations and descriptive text, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our implementation of Deep Co DA is available from http: //github.com/nphdang/Deep Co DA. |
| Open Datasets | Yes | The first contains 13 data sets from (Quinn and Erb, 2020), curated to benchmark compositional data analysis methods for microbiome and similar data1. The second contains 12 data sets from (Vangay et al., 2019), curated to benchmark machine learning methods for microbiome data2. 1 Available from https://zenodo.org/record/3378099/ 2 Available from https://knights-lab.github.io/MLRepo/ |
| Dataset Splits | Yes | We develop the model in two stages. First, we use a discovery set of 13 data sets to design the architecture and choose its hyper-parameters. Second, we use a verification set of 12 unseen data sets to benchmark the final model. |
| Hardware Specification | No | The paper does not provide specific hardware details like GPU models, CPU types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper refers to Python notebooks and uses various statistical and machine learning methods, but does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | Using a discovery set of 13 data sets, we trained models with B = [1, 3, 5, 10] log-bottlenecks and a λs = [0.001, 0.01, 0.1, 1] L1 penalty. Figure 3 shows the standardized performance for all discovery set models for each hyper-parameter combination. Here, we see that B = 5 and λs = 0.01 works well with or without self-explanation. |