reproducibilityindex.ai

Generative Models for Graph-Based Protein Design

Authors: John Ingraham, Vikas Garg, Regina Barzilay, Tommi Jaakkola

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the merits of our approach via a detailed empirical study. Specifically, we evaluate our model's performance for structural generalization to sequences of protein 3D folds that are topologically distinct from those in the training set.
Researcher Affiliation	Academia	John Ingraham, Vikas K. Garg, Regina Barzilay, Tommi Jaakkola Computer Science and Artiﬁcial Intelligence Lab, MIT {ingraham, vgarg, regina, tommi}@csail.mit.edu
Pseudocode	No	The paper describes the model architecture and components in text and diagrams, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at github.com/jingraham/neurips19-graph-protein-design.
Open Datasets	Yes	To evaluate the ability of our models to generalize across different protein folds, we collected a dataset based on the CATH hierarchical classiﬁcation of protein structure [40].
Dataset Splits	Yes	For all domains in the CATH 4.2 40% non-redundant set of proteins, we obtained full chains up to length 500 and then randomly assigned their CATH topology classiﬁcations (CAT codes) to train, validation and test sets at a targeted 80/10/10 split. This resulted in a dataset of 18024 chains in the training set, 608 chains in the validation set, and 1120 chains in the test set.
Hardware Specification	Yes	CPU: single core of Intel Xeon Gold 5115, GPU: NVIDIA RTX 2080
Software Dependencies	Yes	We used the latest version of Rosetta (3.10) to design sequences for our Single chain test set with the fixbb fixed-backbone design protocol and default parameters (Table 4a).
Experiment Setup	Yes	In all experiments, we used three layers of self-attention and position-wise feedforward modules for the encoder and decoder with a hidden dimension of 128. Optimization We trained models using the learning rate schedule and initialization of the original Transformer paper [7], a dropout rate of 10% [42], a label smoothing rate of 10%, and early stopping based on validation perplexity.