Chemically Transferable Generative Backmapping of Coarse-Grained Proteins

Authors: Soojung Yang, Rafael Gomez-Bombarelli

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we perform ablation studies on the model architecture and loss functions, and compare our model with the baseline, CGVAE. For each experiment, we perform five random seed experiments and report the mean and variance of the metrics. Table 1. Ablation study on the model architecture.
Researcher Affiliation Academia 1Computational and Systems Biology, MIT, Cambridge, MA, United States 2Department of Material Science and Engineering, MIT, Cambridge, MA, United States.
Pseudocode Yes Algorithm 1 A pseudocode for the reconstruction of the list of Cartesian coordinates of side chain atoms, L, for a residue with m side chain atoms.
Open Source Code Yes Code and dataset for training and inference are available at https://github.com/learningmatter-mit/Gen ZProt.
Open Datasets Yes Our training and test data are from the protein structural ensemble database PED (Lazar et al., 2021).
Dataset Splits Yes We split the train and test set by protein entries (i.e., models never see the test protein entries during training). The validation set is identical to the test set, and the learning rate reduction and early stopping are controlled based on the validation loss. From 227 total entries of PED, we use 84 entries for training, four entries for validation, and four entries for testing.
Hardware Specification Yes Models were trained with Xeon-G6 GPU nodes until convergence, with a maximum runtime of 20 hours.
Software Dependencies No The paper mentions software like 'e3nn library' and 'PyTorch nn.Embedding' but does not specify their version numbers.
Experiment Setup Yes Table 8. A list of hyperparameters. Node-wise latent variable dimension 36, Atom neighbor cutoff [ Å] 9.0, Residue neighbor cutoff [ Å] 21.0, Encoder convolution depth 3, Decoder convolution depth 4, Maximum training hours [hr] 20, Batch size 4, Learning rate 1e-3, β coefficient for KL divergence 0.05, γ coefficient for Llocal 1.0, δ coefficient for Ltorsion 1.0, η coefficient for Lxyz 1.0, ζ coefficient for Lsteric 3.0.