PhyloGFN: Phylogenetic inference with generative flow networks
Authors: Ming Yang Zhou, Zichao Yan, Elliot Layne, Nikolay Malkin, Dinghuai Zhang, Moksh Jain, Mathieu Blanchette, Yoshua Bengio
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our amortized posterior sampler, Phylo GFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets. and We evaluate Phylo GFN on a suite of 8 real-world benchmark datasets (Table S1 in Appendix C) that is standard in the literature. |
| Researcher Affiliation | Academia | Mingyang Zhou* Mc Gill University Zichao Yan* Mc Gill University Universit e de Montr eal, Mila Elliot Layne Mc Gill University, Mila Nikolay Malkin Universit e de Montr eal, Mila Dinghuai Zhang Universit e de Montr eal, Mila Moksh Jain Universit e de Montr eal, Mila Mathieu Blanchette Mc Gill University, Mila Yoshua Bengio Universit e de Montr eal, Mila |
| Pseudocode | No | The paper describes the methods in narrative text and figures (e.g., Figure 1) but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and data are available https://github.com/zmy1116/phylogfn. See Appendix J for a detailed description. |
| Open Datasets | Yes | We evaluate Phylo GFN on a suite of 8 real-world benchmark datasets (Table S1 in Appendix C) that is standard in the literature. and Table S1: Statistics of the benchmark datasets from DS1 to DS8. Dataset # Species # Sites Reference DS1 27 1949 Hedges et al. (1990) DS2 29 2520 Garey et al. (1996) DS3 36 1812 Yang & Yoder (2003) DS4 41 1137 Henk et al. (2003) DS5 50 378 Lakner et al. (2008) DS6 50 1133 Zhang & Blackwell (2001) DS7 59 1824 Yoder & Yang (2004) DS8 64 1008 Rossman et al. (2001) |
| Dataset Splits | No | The paper mentions training data (e.g., 'trained on a total of 32 million examples') and test results (e.g., MLL estimation, comparisons), but does not explicitly define or specify training, validation, and test splits (percentages or counts) or reference standard predefined splits. |
| Hardware Specification | Yes | Phylo GFN is trained on virtual machines equipped with 10 CPU cores and 10GB RAM for all datasets. We use one V100 GPU for datasets DS1-DS6 and one A100 GPU for DS7-DS8, although the choice of hardware is not essential for running our training algorithms. |
| Software Dependencies | No | The paper mentions using 'Anaconda' for Mr Bayes installation and refers to Python scripts in the supplementary material, but it does not specify version numbers for key software dependencies or libraries used in their own implementation (e.g., Python, PyTorch, TensorFlow versions, or other specific packages with their versions). |
| Experiment Setup | Yes | For Phylo GFN-Bayesian, our models are trained with fixed 500 epochs. For Phylo GFN-Parsimony, our models are trained until the probability of sampling the optimal trees, or the most parsimonious trees our Phylo GFN has seen so far, is above 0.001. Each training epoch consists of 1000 gradient update steps using a batch of 64 trajectories. and Table S3: Common hyperparameters for Phylo GFN-Bayesian and Phylo GFN-Parsimony. Transformer encoder hidden size 128 number of layers 6 number of heads 4 learning rate (model) 5e-5 learning rate (Z) 5e-3 |