ARTree: A Deep Autoregressive Model for Phylogenetic Inference
Authors: Tianyu Xie, Cheng Zhang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness and efficiency of our method on a benchmark of challenging real data tree topology density estimation and variational Bayesian phylogenetic inference problems.In this section, we test the effectiveness and efficiency of ARTree for phylogenetic inference on two benchmark tasks: tree topology density estimation (TDE) and variational Bayesian phylogenetic inference (VBPI). |
| Researcher Affiliation | Academia | School of Mathematical Sciences, Peking University Center for Statistical Science, Peking University tianyuxie@pku.edu.cn, chengzhang@math.pku.edu.cn |
| Pseudocode | Yes | Algorithm 1: ARTree: An autoregressive model for phylogenetic tree topologies |
| Open Source Code | Yes | The code is available at https://github.com/tyuxie/ARTree. |
| Open Datasets | Yes | We perform experiments on eight data sets which we will call DS1-8. These data sets, consisting of sequences from 27 to 64 eukaryote species with 378 to 2520 site observations, are commonly used to benchmark phylogenetic MCMC methods (Hedges et al., 1990; Garey et al., 1996; Yang & Yoder, 2003; Henk et al., 2003; Lakner et al., 2008; Zhang & Blackwell, 2001; Yoder & Yang, 2004; Rossman et al., 2001; Höhna & Drummond, 2012; Larget, 2013; Whidden & Matsen IV, 2015). |
| Dataset Splits | No | The paper mentions collecting samples for training data and discarding burn-in samples, but it does not specify explicit training, validation, or test dataset splits (e.g., 80/10/10 percentages or specific sample counts). |
| Hardware Specification | Yes | All the experiments are run on an Intel Xeon Platinum 9242 processor. |
| Software Dependencies | No | The paper mentions 'All models are implemented in Py Torch' and 'trained with the Adam optimizer', but does not provide specific version numbers for PyTorch or other key software dependencies. |
| Experiment Setup | Yes | All GNNs have L = 2 rounds in the message passing step. All the activation functions in MLPs are exponential linear units (ELUs) (Clevert et al., 2015). The taxa order is set to the lexicographical order of the corresponding species names in all experiments except the ablation studies. All models are implemented in Py Torch (Paszke et al., 2019) and trained with the Adam (Kingma & Ba, 2015) optimizer. The learning rate is 0.001 for SBNs, 0.0001 for ARTree, and 0.001 for the branch length model.For both ARTree and SBN-SGA, the results are collected after 200000 parameter updates with batch size B = 10.We set K = 10 for the multi-sample lower bound (4) and use the following annealed unnormalized posterior at the i-th iteration p(Y , τ, q; βi) = p(Y |τ, q)βip(τ, q) where βi = min{1.0, 0.001 + i/H} is the inverse temperature that goes from 0.001 to 1 after H iterations. For ARTree, a long annealing period H = 200000 is used for DS6 and DS7... and H = 100000 is used for the other data sets. |