Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MDTree: A Masked Dynamic Autoregressive Model for Phylogenetic Inference

Authors: Zelin Zang, ChenRui Duan, Siyuan Li, Jinlin Wu, BingoWing-Kuen Ling, Fuji Yang, Jiebo Luo, Zhen Lei, Stan Z. Li

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on standard benchmarks demonstrate that MDTree outperforms existing methods in accuracy and runtime while producing biologically coherent phylogenies, providing a scalable solution for large-scale evolutionary analysis.
Researcher Affiliation Academia Zelin Zang EMAIL Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science and Innovation (HKISI) School of Engineering, Westlake University Chenrui Duan EMAIL School of Engineering, Westlake University Siyuan Li EMAIL School of Engineering, Westlake University Jinlin Wu EMAIL Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science and Innovation (HKISI) Bingo Wing-Kuen Ling EMAIL Center for the Integrated Circuits and Artificial Intelligence, Tsientang Institute for Advanced Study, Zhejiang 310024 Fuji Yang EMAIL Center for the Integrated Circuits and Artificial Intelligence, Tsientang Institute for Advanced Study Jiebo Luo EMAIL Hong Kong Institute of Science and Innovation (HKISI) Zhen Lei EMAIL Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science and Innovation (HKISI) Institute of Automation, Chinese Academy of Sciences (CASIA) School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS) Stan Z. Li EMAIL School of Engineering, Westlake University, Hangzhou, Zhejiang 310030, China
Pseudocode No The paper describes the methodology using prose and diagrams (e.g., Figure 3: Framework of MDTree for dynamic autoregressive tree generation), but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an unambiguous statement of releasing code or a direct link to a source-code repository.
Open Datasets Yes Experiments on phylogenetic benchmarks show that MDTree outperforms existing methods in accuracy and e!ciency. Empirical analysis of Angiosperms353 (Zuntini et al., 2024) further demonstrates its ability to recover evolutionary lineages, including Rosaceae and Moraceae, suggesting broader biological applications. Evaluation Tasks and Datasets. We assess MDTree s performance on two key tasks: TDE, which focuses on optimizing tree topologies with MLL metric, and VBPI, where tree topologies and branch lengths are jointly inferred, using ELBO and MLL. These evaluations span eight diverse benchmark datasets, covering various organisms like marine animals, plants, bacteria, fungi, and eukaryotes, as outlined in Appendix C.
Dataset Splits No The paper states: 'These evaluations span eight diverse benchmark datasets, covering various organisms like marine animals, plants, bacteria, fungi, and eukaryotes, as outlined in Appendix C.' However, the provided text does not contain Appendix C, nor does it explicitly mention specific training, validation, or test splits for any of the datasets.
Hardware Specification No The paper states: 'We thank the AI Station of Westlake University for the support of GPUs.' This is a general statement and does not provide specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions several models and tools used (e.g., DNABERT2, Hyena DNA, NT, PAUP*, Mr Bayes, LAX model), and PyTorch in the bibliography, but it does not provide a list of ancillary software dependencies with specific version numbers required for reproducibility.
Experiment Setup No The paper states: 'All training details and hyperparameters are provided in Appendix E.' However, the provided text does not contain Appendix E, nor does it explicitly detail hyperparameter values, optimizer settings, or other specific experimental setup configurations in the main body.