Finding Most Compatible Phylogenetic Trees over Multi-State Characters

Authors: Tuukka Korhonen, Matti J„ärvisalo1544-1551

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically our approach outperforms in scalability the earlier proposed approaches w.r.t. various parameters underlying the problem. We empirically evaluate the performance of our hybrid BT-Max SAT approach.
Researcher Affiliation Academia Tuukka Korhonen, Matti J arvisalo HIIT, Department of Computer Science, University of Helsinki, Finland
Pseudocode No The paper describes algorithms and methods but does not provide any structured pseudocode or algorithm blocks.
Open Source Code Yes Our implementation, all test data and detailed results are available via https://github.com/Laakeri/ phylogeny-aaai.
Open Datasets Yes We generated a comprehensive set of benchmarks using the ms data generator (Hudson 2002) employed in various empirical evaluations of algorithms for maximum compatibility and perfect phylogeny (Gusfield, Frid, and Brown 2007; Gusfield 2010; Stevens and Gusfield 2010; Gysel and Gusfield 2011; Gysel, Gusfield, and Stevens 2013; Coulombe, Stevens, and Gusfield 2015) extended to multistate instances as described in (Gusfield 2010).
Dataset Splits No The paper describes how instances were generated and parameters varied, but does not specify explicit training, validation, and test dataset splits with percentages or counts.
Hardware Specification Yes The experiments were run single-threaded on computers with 2.4-GHz Intel Xeon E5-2680-v4 processors with a per-instance 2-hour time limit and 32-GB memory limit.
Software Dependencies Yes For the PBO approach, we used the code provided by the authors, in particular the PBO-RR variant which includes additional constraints to strengthen the PBO formulation, which we observed to be the best-performing variant in preliminary experiments. For the experiments on PBO, we used Minisat+ (E en and Sorensson 2006) as the PBO solver. For both of the IP-based approaches, we used CPLEX 12.7.1 (IBM ILOG 2017) as the IP solver in the experiments. We implemented our BT-Max SAT approach based on the Triangulator implementation of BT (Korhonen, Berg, and J arvisalo 2019), and used Max HS (Davies and Bacchus 2013) as the Max SAT solver.
Experiment Setup Yes We generated a comprehensive set of benchmarks using the ms data generator (Hudson 2002) employed in various empirical evaluations of algorithms for maximum compatibility and perfect phylogeny (Gusfield, Frid, and Brown 2007; Gusfield 2010; Stevens and Gusfield 2010; Gysel and Gusfield 2011; Gysel, Gusfield, and Stevens 2013; Coulombe, Stevens, and Gusfield 2015) extended to multistate instances as described in (Gusfield 2010). The generator generates instances based on four parameters: n, the number of taxa; m, the number of characters; k, the number of states for each character; and r, the recombination parameter that controls how far the data is from admitting a perfect phylogeny. Following Gusfield (2010) we set n = m. For the parameter k, the values 4 and 20 are biologically relevant (nucleotide and amino acid) (Wiley and Lieberman 2011). Setting r = 0 guarantees a perfect phylogeny; we let r vary from 0 to 4.