Efficient Generative Modelling of Protein Structure Fragments using a Deep Markov Model

Authors: Christian B Thygesen, Christian Skjødt Steenmans, Ahmad Salim Al-Sibahi, Lys Sanz Moreta, Anders Bundgård Sørensen, Thomas Hamelryck

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental BIFROST was trained on a data set of fragments derived from a set of 3733 proteins from the cullpdb data set (Wang & Dunbrack, 2005). Prior to training, the data was randomly split into train, test, and validation sets with a 60/20/20% ratio. BIFROST was benchmarked against Rosetta’s fragment picker (Gront et al., 2011) using the precision and coverage metrics.
Researcher Affiliation Collaboration 1Department of Computer Science, University of Copenhagen, Copenhagen, Denmark 2Evaxion Biotech, Copenhagen, Denmark 3Department of Biology, University of Copenhagen, Copenhagen, Denmark.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include an unambiguous statement about releasing open-source code or provide a direct link to a code repository for the methodology described.
Open Datasets Yes BIFROST was trained on a data set of fragments derived from a set of 3733 proteins from the cullpdb data set (Wang & Dunbrack, 2005).
Dataset Splits Yes Prior to training, the data was randomly split into train, test, and validation sets with a 60/20/20% ratio.
Hardware Specification Yes Training and testing were carried out on a machine equipped with an Intel Xeon CPU E5-2630 and Tesla M10 GPU.
Software Dependencies Yes The presented model was implemented in the deep probabilistic programming language Pyro, version 1.3.0 (Bingham et al., 2019) and Pytorch version 1.4.0 (Paszke et al., 2019).
Experiment Setup Yes The final model was trained with a learning rate of 0.0003 with a scheduler reducing the learning rate by 90% when no improvement was seen for 10 epochs. Minibatch size was 200. The Adam optimiser was used with a β1 and β2 of 0.96 and 0.999 respectively. The latent space dimensionality was 40. All hidden activations (if not specified above) were Re LU activations. We employed norm scaling of the gradient to a norm of 10.0. Finally, early stopping was employed with a patience of 50 epochs.