Generalizing Tree Probability Estimation via Bayesian Networks

Authors: Cheng Zhang, Frederick A Matsen IV

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both synthetic and real data show that our methods greatly outperform the current practice of using the empirical distribution, as well as a previous effort for probability estimation on trees.
Researcher Affiliation Academia Cheng Zhang Computational Biology Program Fred Hutchinson Cancer Research Center Seattle, WA 98109 chengz23@fredhutch.org Frederick A. Matsen IV Computational Biology Program Fred Hutchinson Cancer Research Center Seattle, WA 98109 matsen@fredhutch.org
Pseudocode Yes Algorithm 1 Expectation Maximization for SBN
Open Source Code Yes The code is made available at https://github.com/zcrabbit/sbn.
Open Datasets Yes We choose a tractable but challenging tree space, the space of unrooted trees with 8 leaves, which contains 10395 unique trees. ... We now investigate the performance on large unrooted tree space posterior estimation using 8 real datasets commonly used to benchmark phylogenetic MCMC methods [Lakner et al., 2008, Höhna and Drummond, 2012, Larget, 2013, Whidden and Matsen, 2015] (Table 1).
Dataset Splits No The paper mentions discarding 'the first 25% as burn-in' for MCMC samples, which is a data preprocessing step. However, there is no explicit mention of a separate validation set or specific train/validation/test splits for the SBN training.
Hardware Specification No The paper does not specify any hardware details such as GPU/CPU models, memory, or specific computing platforms used for experiments.
Software Dependencies No The paper mentions 'Mr Bayes' but does not provide specific version numbers for it or any other software libraries or dependencies used in the experiments.
Experiment Setup Yes For sbn-em-α, we use the sample frequency counts of the root splits and parent-child subsplit pairs as the equivalent sample counts (see Algorithm 1). ... We vary β and K to control the difficulty of the learning task, and average over 10 independent runs for each configuration. ... For each of these data sets, 10 single-chain Mr Bayes [Ronquist et al., 2012] replicates are run for one billion iterations and sampled every 1000 iterations, using the simple Jukes and Cantor [1969] substitution model. We discard the first 25% as burn-in for a total of 7.5 million posterior samples per data set. ... For sbn-em-α, we use α = 0.0001 to give some weak regularization.