PhyloGen: Language Model-Enhanced Phylogenetic Inference via Graph Structure Generation

Authors: ChenRui Duan, Zelin Zang, Siyuan Li, Yongjie Xu, Stan Z. Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness and robustness of Phylo Gen on eight real-world benchmark datasets. Visualization results confirm Phylo Gen provides deeper insights into phylogenetic relationships.
Researcher Affiliation Academia 1Zhejiang University, College of Computer Science and Technology; 2Westlake University
Pseudocode Yes Algorithm 1 Phylogenetic Tree Generation
Open Source Code No We are not providing the source code now and will release the full code if it is acceptable.
Open Datasets Yes Our model Phylo Gen performs phylogenetic inference on biological sequence datasets of 27 to 64 species compiled in [17]. In Tab. 6, we summarize the statistics of benchmark datasets.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits needed to reproduce the experiment, such as percentages or specific sample counts for each split.
Hardware Specification No The paper only states general hardware like 'GPUs' without providing specific models (e.g., 'NVIDIA A100') or processor details. It mentions: 'We thank the AI Station of Westlake University for the support of GPUs.'
Software Dependencies No The paper mentions 'Pytorch [27]' but does not provide a specific version number for it or other key libraries.
Experiment Setup Yes Table 7: Training Settings of Phylo Gen provides: Optimizer Adam optimizer, Learning rate 1e-4, Schedule Step Learning Rate, Weight Decay 0.0, momentum 0.9, eta_min 1e-6, base_lr 1e-4, max_lr 0.001, scheduler.gamma 0.75, annealing init 0.001, annealing steps 100,000. Table 8: Common Hyperparameters for Phylo Gen specifies: Topo Net Hidden Dim. 256, # Layer 2, Output Dim. 4, Tree Encoder Hidden Dim. 256, # Layer 2, Tree Decoder Hidden Dim. 256, # Layer 2, DGCNN # Layer 2.