PhyloGen: Language Model-Enhanced Phylogenetic Inference via Graph Structure Generation
Authors: ChenRui Duan, Zelin Zang, Siyuan Li, Yongjie Xu, Stan Z. Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness and robustness of Phylo Gen on eight real-world benchmark datasets. Visualization results confirm Phylo Gen provides deeper insights into phylogenetic relationships. |
| Researcher Affiliation | Academia | 1Zhejiang University, College of Computer Science and Technology; 2Westlake University |
| Pseudocode | Yes | Algorithm 1 Phylogenetic Tree Generation |
| Open Source Code | No | We are not providing the source code now and will release the full code if it is acceptable. |
| Open Datasets | Yes | Our model Phylo Gen performs phylogenetic inference on biological sequence datasets of 27 to 64 species compiled in [17]. In Tab. 6, we summarize the statistics of benchmark datasets. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits needed to reproduce the experiment, such as percentages or specific sample counts for each split. |
| Hardware Specification | No | The paper only states general hardware like 'GPUs' without providing specific models (e.g., 'NVIDIA A100') or processor details. It mentions: 'We thank the AI Station of Westlake University for the support of GPUs.' |
| Software Dependencies | No | The paper mentions 'Pytorch [27]' but does not provide a specific version number for it or other key libraries. |
| Experiment Setup | Yes | Table 7: Training Settings of Phylo Gen provides: Optimizer Adam optimizer, Learning rate 1e-4, Schedule Step Learning Rate, Weight Decay 0.0, momentum 0.9, eta_min 1e-6, base_lr 1e-4, max_lr 0.001, scheduler.gamma 0.75, annealing init 0.001, annealing steps 100,000. Table 8: Common Hyperparameters for Phylo Gen specifies: Topo Net Hidden Dim. 256, # Layer 2, Output Dim. 4, Tree Encoder Hidden Dim. 256, # Layer 2, Tree Decoder Hidden Dim. 256, # Layer 2, DGCNN # Layer 2. |