Ancestral protein sequence reconstruction using a tree-structured Ornstein-Uhlenbeck variational autoencoder

Authors: Lys Sanz Moreta, Ola Rønning, Ahmad Salim Al-Sibahi, Jotun Hein, Douglas Theobald, Thomas Hamelryck

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results and ablation studies indicate that the explicit representation of evolution using a suitable tree-structured prior has the potential to improve representation learning of biological sequences considerably. We show that our probabilistic model, called Draupnir, is about on par with or better than the accuracy of established ASR methods for a standard experimentallyderived data set (Alieva et al., 2008; Randall et al., 2016) and several simulated data sets.
Researcher Affiliation Academia Lys Sanz Moreta Probabilistic programming group PLTC Section University of Copenhagen Copenhagen, Denmark lys.sanz.moreta@outlook.com
Pseudocode Yes The pseudocode of the Draupnir model is given in Algorithm 1; Figure 1 shows the corresponding graphical model. (Section 4.1). Algorithms 2 and 3 are also provided.
Open Source Code Yes Draupnir can be found at https://github.com/LysSanzMoreta/DRAUPNIR_ASR and installed as a python library.
Open Datasets Yes The data sets include eight simulated data sets generated using the software Evolve AGene (Hall, 2016) and three data sets with experimentally determined ancestral sequences. The description and origin of the data sets can be found in Appendix A.3. Table 3 lists sources such as Randall et al. (2016) and Alieva et al. (2008).
Dataset Splits No The paper mentions 'training set' (leaves) and 'test set' (ancestors) in the context of evaluation (Figure 5 caption), but does not provide explicit split percentages, sample counts, or a distinct validation set. The problem is about inferring ancestral sequences, not a typical train/validation/test split of known data.
Hardware Specification Yes All programs were executed on an Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz machine with a Quadro RTX 6000 GPU.
Software Dependencies No The paper mentions key software like Pyro and Adam optimizer but does not specify their version numbers, e.g., 'Draupnir was implemented in the deep probabilistic programming language Pyro (Bingham et al., 2019)... We use Adam (Kingma & Ba, 2014) as the optimizer...'
Experiment Setup Yes In all experiments, n Z = 30. We use Adam as the optimizer using the default values. Training details can be found in Appendix A.4, including Epochs, Plate size, and Learning rate scheduler for each dataset. Appendix A.8 provides detailed settings for benchmarking methods like PAML-Code ML, Phylo Bayes, Fast ML, and IQ-Tree.