reproducibilityindex.ai

End-to-End Learning on 3D Protein Structure for Interface Prediction

Authors: Raphael Townshend, Rishi Bedi, Patricia Suriana, Ron Dror

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We built a training dataset, the Database of Interacting Protein Structures (DIPS), that contains biases but is two orders of magnitude larger than those used previously. We found that these biases signiﬁcantly degrade the performance of existing methods on gold-standard data. Hypothesizing that assumptions baked into the hand-crafted features on which these methods depend were the source of the problem, we developed the ﬁrst end-to-end learning model for protein interface prediction, the Siamese Atomic Surfacelet Network (SASNet). Using only spatial coordinates and identities of atoms, SASNet outperforms state-of-the-art methods trained on gold-standard structural data, even when trained on only 3% of our new dataset. Code and data available at https://github.com/drorlab/DIPS.
Researcher Affiliation	Academia	Raphael J. L. Townshend Stanford University raphael@cs.stanford.edu Rishi Bedi Stanford University rbedi@cs.stanford.edu Patricia A. Suriana Stanford University psuriana@stanford.edu Ron O. Dror Stanford University rondror@cs.stanford.edu
Pseudocode	No	The paper describes the SASNet architecture textually and with a diagram (Figure 2F), but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code and data available at https://github.com/drorlab/DIPS.
Open Datasets	Yes	We built a training dataset, the Database of Interacting Protein Structures (DIPS), that contains biases but is two orders of magnitude larger than those used previously.
Dataset Splits	Yes	State-of-the-art methods [23], [24] further split DB5 into a training/validation set of 175 complexes, DB5-train, corresponding to DB4 (the complexes from the previous version, Docking Benchmark 4) and a test set, DB5-test, of 55 complexes (the complexes added in the update from DB4 to DB5).
Hardware Specification	Yes	All models were trained across 4 Titan X GPUs using data-level parallelism, and the best model took 12 hours to train.
Software Dependencies	No	The paper mentions using RMSProp optimizer and convolutional neural networks, but does not specify versions for any programming languages or software libraries (e.g., Python, TensorFlow, PyTorch, scikit-learn).
Experiment Setup	Yes	Our model with the best validation performance involved training on 163840 examples, featurizing a grid of edge length 41 Å with voxel resolution of 1 Å (thus starting at a cube size of 41x41x41), and then applying 6 layers of convolution (each of size 3x3x3, with the 6 layers having 32, 32, 64, 64, 128, 128 convolutional ﬁlters, respectively) and 2 layers of max pooling... A fully connected layer with 512 parameters lays at the top of each tower, and the outputs of both towers are concatenated and passed through two more fully connected layers with 512 parameters each, leading to the ﬁnal prediction. The number of ﬁlters used in each convolutional layer is doubled every other layer to allow for an increase of the speciﬁcity of the ﬁlters as the spatial resolution decreases. We use the RMSProp optimizer with a learning rate of 0.0001. The positive-negative class imbalance was set to 1:1.