Improved Variational Bayesian Phylogenetic Inference with Normalizing Flows

Authors: Cheng Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that VBPI-NF significantly improves upon the vanilla VBPI on a benchmark of challenging real data Bayesian phylogenetic inference problems. ... We performed experiments on 8 real datasets that are commonly used to benchmark Bayesian phylogenetic inference methods (Hedges et al., 1990; Garey et al., 1996; Yang and Yoder, 2003; Henk et al., 2003; Lakner et al., 2008; Zhang and Blackwell, 2001; Yoder and Yang, 2004; Rossman et al., 2001; Höhna and Drummond, 2012; Larget, 2013; Whidden and Matsen IV, 2015). ... Table 1 shows the estimates of the lower bounds (K = 1, 10) and the marginal likelihood from different variational approaches on the 8 benchmark datasets.
Researcher Affiliation Academia Cheng Zhang School of Mathematical Sciences and Center for Statistical Science Peking University, Beijing, China chengzhang@math.pku.edu.cn
Pseudocode Yes See algorithm 1 in the supplement for more details.
Open Source Code Yes The code is available at https://github.com/zcrabbit/vbpi-nf.
Open Datasets Yes We performed experiments on 8 real datasets that are commonly used to benchmark Bayesian phylogenetic inference methods (Hedges et al., 1990; Garey et al., 1996; Yang and Yoder, 2003; Henk et al., 2003; Lakner et al., 2008; Zhang and Blackwell, 2001; Yoder and Yang, 2004; Rossman et al., 2001; Höhna and Drummond, 2012; Larget, 2013; Whidden and Matsen IV, 2015). These datasets, which we will call DS1-8, consist of sequences from 27 to 64 eukaryote species with 378 to 2520 site observations (see Table 1 and Lakner et al. (2008)).
Dataset Splits No The paper describes using a multi-sample lower bound for optimization and evaluates performance on benchmark datasets, but it does not specify explicit train/validation/test dataset splits for the input sequence data in a way that allows reproduction of data partitioning.
Hardware Specification No The author is grateful for the computational resources provided by the High-performance Computing Platform of Peking University. This statement does not provide specific hardware details such as GPU/CPU models or memory.
Software Dependencies No All models were implemented in Pytorch (Paszke et al., 2019) with the Adam optimizer (Kingma and Ba, 2015). Specific version numbers for PyTorch or other libraries are not provided.
Experiment Setup Yes We set K = 10 for the multi-sample lower bound, with a schedule λn = min(1, 0.001 + n/100000), going from 0.001 to 1 after 100000 iterations. We evaluate the performance of our permutation equivariant normalizing flows with varying numbers of layers. All models were implemented in Pytorch (Paszke et al., 2019) with the Adam optimizer (Kingma and Ba, 2015)... Results were collected after 400,000 parameter updates.