Ancestral protein sequence reconstruction using a tree-structured Ornstein-Uhlenbeck variational autoencoder
Authors: Lys Sanz Moreta, Ola Rønning, Ahmad Salim Al-Sibahi, Jotun Hein, Douglas Theobald, Thomas Hamelryck
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results and ablation studies indicate that the explicit representation of evolution using a suitable tree-structured prior has the potential to improve representation learning of biological sequences considerably. We show that our probabilistic model, called Draupnir, is about on par with or better than the accuracy of established ASR methods for a standard experimentallyderived data set (Alieva et al., 2008; Randall et al., 2016) and several simulated data sets. |
| Researcher Affiliation | Academia | Lys Sanz Moreta Probabilistic programming group PLTC Section University of Copenhagen Copenhagen, Denmark lys.sanz.moreta@outlook.com |
| Pseudocode | Yes | The pseudocode of the Draupnir model is given in Algorithm 1; Figure 1 shows the corresponding graphical model. (Section 4.1). Algorithms 2 and 3 are also provided. |
| Open Source Code | Yes | Draupnir can be found at https://github.com/LysSanzMoreta/DRAUPNIR_ASR and installed as a python library. |
| Open Datasets | Yes | The data sets include eight simulated data sets generated using the software Evolve AGene (Hall, 2016) and three data sets with experimentally determined ancestral sequences. The description and origin of the data sets can be found in Appendix A.3. Table 3 lists sources such as Randall et al. (2016) and Alieva et al. (2008). |
| Dataset Splits | No | The paper mentions 'training set' (leaves) and 'test set' (ancestors) in the context of evaluation (Figure 5 caption), but does not provide explicit split percentages, sample counts, or a distinct validation set. The problem is about inferring ancestral sequences, not a typical train/validation/test split of known data. |
| Hardware Specification | Yes | All programs were executed on an Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz machine with a Quadro RTX 6000 GPU. |
| Software Dependencies | No | The paper mentions key software like Pyro and Adam optimizer but does not specify their version numbers, e.g., 'Draupnir was implemented in the deep probabilistic programming language Pyro (Bingham et al., 2019)... We use Adam (Kingma & Ba, 2014) as the optimizer...' |
| Experiment Setup | Yes | In all experiments, n Z = 30. We use Adam as the optimizer using the default values. Training details can be found in Appendix A.4, including Epochs, Plate size, and Learning rate scheduler for each dataset. Appendix A.8 provides detailed settings for benchmarking methods like PAML-Code ML, Phylo Bayes, Fast ML, and IQ-Tree. |