reproducibilityindex.ai

Generalizing Tree Probability Estimation via Bayesian Networks

Authors: Cheng Zhang, Frederick A Matsen IV

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both synthetic and real data show that our methods greatly outperform the current practice of using the empirical distribution, as well as a previous effort for probability estimation on trees.
Researcher Affiliation	Academia	Cheng Zhang Computational Biology Program Fred Hutchinson Cancer Research Center Seattle, WA 98109 chengz23@fredhutch.org Frederick A. Matsen IV Computational Biology Program Fred Hutchinson Cancer Research Center Seattle, WA 98109 matsen@fredhutch.org
Pseudocode	Yes	Algorithm 1 Expectation Maximization for SBN
Open Source Code	Yes	The code is made available at https://github.com/zcrabbit/sbn.
Open Datasets	Yes	We choose a tractable but challenging tree space, the space of unrooted trees with 8 leaves, which contains 10395 unique trees. ... We now investigate the performance on large unrooted tree space posterior estimation using 8 real datasets commonly used to benchmark phylogenetic MCMC methods [Lakner et al., 2008, Höhna and Drummond, 2012, Larget, 2013, Whidden and Matsen, 2015] (Table 1).
Dataset Splits	No	The paper mentions discarding 'the ﬁrst 25% as burn-in' for MCMC samples, which is a data preprocessing step. However, there is no explicit mention of a separate validation set or specific train/validation/test splits for the SBN training.
Hardware Specification	No	The paper does not specify any hardware details such as GPU/CPU models, memory, or specific computing platforms used for experiments.
Software Dependencies	No	The paper mentions 'Mr Bayes' but does not provide specific version numbers for it or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	For sbn-em-α, we use the sample frequency counts of the root splits and parent-child subsplit pairs as the equivalent sample counts (see Algorithm 1). ... We vary β and K to control the difﬁculty of the learning task, and average over 10 independent runs for each conﬁguration. ... For each of these data sets, 10 single-chain Mr Bayes [Ronquist et al., 2012] replicates are run for one billion iterations and sampled every 1000 iterations, using the simple Jukes and Cantor [1969] substitution model. We discard the ﬁrst 25% as burn-in for a total of 7.5 million posterior samples per data set. ... For sbn-em-α, we use α = 0.0001 to give some weak regularization.