Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Finding Most Compatible Phylogenetic Trees over Multi-State Characters
Authors: Tuukka Korhonen, Matti Järvisalo1544-1551
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically our approach outperforms in scalability the earlier proposed approaches w.r.t. various parameters underlying the problem. We empirically evaluate the performance of our hybrid BT-Max SAT approach. |
| Researcher Affiliation | Academia | Tuukka Korhonen, Matti J arvisalo HIIT, Department of Computer Science, University of Helsinki, Finland |
| Pseudocode | No | The paper describes algorithms and methods but does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation, all test data and detailed results are available via https://github.com/Laakeri/ phylogeny-aaai. |
| Open Datasets | Yes | We generated a comprehensive set of benchmarks using the ms data generator (Hudson 2002) employed in various empirical evaluations of algorithms for maximum compatibility and perfect phylogeny (Gusfield, Frid, and Brown 2007; Gusfield 2010; Stevens and Gusfield 2010; Gysel and Gusfield 2011; Gysel, Gusfield, and Stevens 2013; Coulombe, Stevens, and Gusfield 2015) extended to multistate instances as described in (Gusfield 2010). |
| Dataset Splits | No | The paper describes how instances were generated and parameters varied, but does not specify explicit training, validation, and test dataset splits with percentages or counts. |
| Hardware Specification | Yes | The experiments were run single-threaded on computers with 2.4-GHz Intel Xeon E5-2680-v4 processors with a per-instance 2-hour time limit and 32-GB memory limit. |
| Software Dependencies | Yes | For the PBO approach, we used the code provided by the authors, in particular the PBO-RR variant which includes additional constraints to strengthen the PBO formulation, which we observed to be the best-performing variant in preliminary experiments. For the experiments on PBO, we used Minisat+ (E en and Sorensson 2006) as the PBO solver. For both of the IP-based approaches, we used CPLEX 12.7.1 (IBM ILOG 2017) as the IP solver in the experiments. We implemented our BT-Max SAT approach based on the Triangulator implementation of BT (Korhonen, Berg, and J arvisalo 2019), and used Max HS (Davies and Bacchus 2013) as the Max SAT solver. |
| Experiment Setup | Yes | We generated a comprehensive set of benchmarks using the ms data generator (Hudson 2002) employed in various empirical evaluations of algorithms for maximum compatibility and perfect phylogeny (Gusfield, Frid, and Brown 2007; Gusfield 2010; Stevens and Gusfield 2010; Gysel and Gusfield 2011; Gysel, Gusfield, and Stevens 2013; Coulombe, Stevens, and Gusfield 2015) extended to multistate instances as described in (Gusfield 2010). The generator generates instances based on four parameters: n, the number of taxa; m, the number of characters; k, the number of states for each character; and r, the recombination parameter that controls how far the data is from admitting a perfect phylogeny. Following Gusfield (2010) we set n = m. For the parameter k, the values 4 and 20 are biologically relevant (nucleotide and amino acid) (Wiley and Lieberman 2011). Setting r = 0 guarantees a perfect phylogeny; we let r vary from 0 to 4. |