Generalizing Tree Probability Estimation via Bayesian Networks
Authors: Cheng Zhang, Frederick A Matsen IV
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both synthetic and real data show that our methods greatly outperform the current practice of using the empirical distribution, as well as a previous effort for probability estimation on trees. |
| Researcher Affiliation | Academia | Cheng Zhang Computational Biology Program Fred Hutchinson Cancer Research Center Seattle, WA 98109 chengz23@fredhutch.org Frederick A. Matsen IV Computational Biology Program Fred Hutchinson Cancer Research Center Seattle, WA 98109 matsen@fredhutch.org |
| Pseudocode | Yes | Algorithm 1 Expectation Maximization for SBN |
| Open Source Code | Yes | The code is made available at https://github.com/zcrabbit/sbn. |
| Open Datasets | Yes | We choose a tractable but challenging tree space, the space of unrooted trees with 8 leaves, which contains 10395 unique trees. ... We now investigate the performance on large unrooted tree space posterior estimation using 8 real datasets commonly used to benchmark phylogenetic MCMC methods [Lakner et al., 2008, Höhna and Drummond, 2012, Larget, 2013, Whidden and Matsen, 2015] (Table 1). |
| Dataset Splits | No | The paper mentions discarding 'the first 25% as burn-in' for MCMC samples, which is a data preprocessing step. However, there is no explicit mention of a separate validation set or specific train/validation/test splits for the SBN training. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU/CPU models, memory, or specific computing platforms used for experiments. |
| Software Dependencies | No | The paper mentions 'Mr Bayes' but does not provide specific version numbers for it or any other software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | For sbn-em-α, we use the sample frequency counts of the root splits and parent-child subsplit pairs as the equivalent sample counts (see Algorithm 1). ... We vary β and K to control the difficulty of the learning task, and average over 10 independent runs for each configuration. ... For each of these data sets, 10 single-chain Mr Bayes [Ronquist et al., 2012] replicates are run for one billion iterations and sampled every 1000 iterations, using the simple Jukes and Cantor [1969] substitution model. We discard the first 25% as burn-in for a total of 7.5 million posterior samples per data set. ... For sbn-em-α, we use α = 0.0001 to give some weak regularization. |