Learnable Topological Features For Phylogenetic Inference via Graph Neural Networks
Authors: Cheng Zhang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness and efficiency of our method on a simulated data tree probability estimation task and a benchmark of challenging real data variational Bayesian phylogenetic inference problems. |
| Researcher Affiliation | Academia | Cheng Zhang School of Mathematical Sciences and Center for Statistical Science Peking University, Beijing, China chengzhang@math.pku.edu.cn |
| Pseudocode | Yes | Algorithm 1 A Two-pass Algorithm for Interior Node Embedding |
| Open Source Code | Yes | The code is available at https://github.com/zcrabbit/vbpi-gnn. |
| Open Datasets | Yes | All methods were evaluated on 8 real datasets that are commonly used to benchmark Bayesian phylogenetic inference methods (Hedges et al., 1990; Garey et al., 1996; Yang & Yoder, 2003; Henk et al., 2003; Lakner et al., 2008; Zhang & Blackwell, 2001; Yoder & Yang, 2004; Rossman et al., 2001; Höhna & Drummond, 2012; Larget, 2013; Whidden & Matsen IV, 2015). These datasets, which we call DS1-8, consist of sequences from 27 to 64 eukaryote species with 378 to 2520 site observations. |
| Dataset Splits | No | No explicit training/validation/test data splits (e.g., percentages or counts) are mentioned in the paper for model evaluation. The paper describes simulated and real datasets used, and how models were trained (e.g., using NCE loss, parameter updates), but not how data was partitioned for validation/testing. |
| Hardware Specification | No | The author is grateful for the computational resources provided by the High-performance Computing Platform of Peking University. |
| Software Dependencies | No | All models were implemented in Pytorch (Paszke et al., 2019) with the Adam optimizer (Kingma & Ba, 2015). |
| Experiment Setup | Yes | All GNN variants have 2 GNN layers (including the input layer), and all involved MLPs have 2 layers. We used summation as our permutation invariant aggregation function for graph-level features and maximization for edge-level features. All models were implemented in Pytorch (Paszke et al., 2019) with the Adam optimizer (Kingma & Ba, 2015).We set K = 10 for the multi-sample lower bound, with a schedule λn = min(1, 0.001 + n/100000), going from 0.001 to 1 after 100000 iterations. The Monte Carlo gradient estimates for the tree topology parameters and branch length parameters were obtained via VIMCO (Mnih & Rezende, 2016) and the reparameterization trick (Kingma & Welling, 2014) respectively. Results were collected after 400,000 parameter updates. |