reproducibilityindex.ai

Learnable Topological Features For Phylogenetic Inference via Graph Neural Networks

Authors: Cheng Zhang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness and efﬁciency of our method on a simulated data tree probability estimation task and a benchmark of challenging real data variational Bayesian phylogenetic inference problems.
Researcher Affiliation	Academia	Cheng Zhang School of Mathematical Sciences and Center for Statistical Science Peking University, Beijing, China chengzhang@math.pku.edu.cn
Pseudocode	Yes	Algorithm 1 A Two-pass Algorithm for Interior Node Embedding
Open Source Code	Yes	The code is available at https://github.com/zcrabbit/vbpi-gnn.
Open Datasets	Yes	All methods were evaluated on 8 real datasets that are commonly used to benchmark Bayesian phylogenetic inference methods (Hedges et al., 1990; Garey et al., 1996; Yang & Yoder, 2003; Henk et al., 2003; Lakner et al., 2008; Zhang & Blackwell, 2001; Yoder & Yang, 2004; Rossman et al., 2001; Höhna & Drummond, 2012; Larget, 2013; Whidden & Matsen IV, 2015). These datasets, which we call DS1-8, consist of sequences from 27 to 64 eukaryote species with 378 to 2520 site observations.
Dataset Splits	No	No explicit training/validation/test data splits (e.g., percentages or counts) are mentioned in the paper for model evaluation. The paper describes simulated and real datasets used, and how models were trained (e.g., using NCE loss, parameter updates), but not how data was partitioned for validation/testing.
Hardware Specification	No	The author is grateful for the computational resources provided by the High-performance Computing Platform of Peking University.
Software Dependencies	No	All models were implemented in Pytorch (Paszke et al., 2019) with the Adam optimizer (Kingma & Ba, 2015).
Experiment Setup	Yes	All GNN variants have 2 GNN layers (including the input layer), and all involved MLPs have 2 layers. We used summation as our permutation invariant aggregation function for graph-level features and maximization for edge-level features. All models were implemented in Pytorch (Paszke et al., 2019) with the Adam optimizer (Kingma & Ba, 2015).We set K = 10 for the multi-sample lower bound, with a schedule λn = min(1, 0.001 + n/100000), going from 0.001 to 1 after 100000 iterations. The Monte Carlo gradient estimates for the tree topology parameters and branch length parameters were obtained via VIMCO (Mnih & Rezende, 2016) and the reparameterization trick (Kingma & Welling, 2014) respectively. Results were collected after 400,000 parameter updates.