End-to-End Learning on 3D Protein Structure for Interface Prediction
Authors: Raphael Townshend, Rishi Bedi, Patricia Suriana, Ron Dror
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We built a training dataset, the Database of Interacting Protein Structures (DIPS), that contains biases but is two orders of magnitude larger than those used previously. We found that these biases significantly degrade the performance of existing methods on gold-standard data. Hypothesizing that assumptions baked into the hand-crafted features on which these methods depend were the source of the problem, we developed the first end-to-end learning model for protein interface prediction, the Siamese Atomic Surfacelet Network (SASNet). Using only spatial coordinates and identities of atoms, SASNet outperforms state-of-the-art methods trained on gold-standard structural data, even when trained on only 3% of our new dataset. Code and data available at https://github.com/drorlab/DIPS. |
| Researcher Affiliation | Academia | Raphael J. L. Townshend Stanford University raphael@cs.stanford.edu Rishi Bedi Stanford University rbedi@cs.stanford.edu Patricia A. Suriana Stanford University psuriana@stanford.edu Ron O. Dror Stanford University rondror@cs.stanford.edu |
| Pseudocode | No | The paper describes the SASNet architecture textually and with a diagram (Figure 2F), but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and data available at https://github.com/drorlab/DIPS. |
| Open Datasets | Yes | We built a training dataset, the Database of Interacting Protein Structures (DIPS), that contains biases but is two orders of magnitude larger than those used previously. |
| Dataset Splits | Yes | State-of-the-art methods [23], [24] further split DB5 into a training/validation set of 175 complexes, DB5-train, corresponding to DB4 (the complexes from the previous version, Docking Benchmark 4) and a test set, DB5-test, of 55 complexes (the complexes added in the update from DB4 to DB5). |
| Hardware Specification | Yes | All models were trained across 4 Titan X GPUs using data-level parallelism, and the best model took 12 hours to train. |
| Software Dependencies | No | The paper mentions using RMSProp optimizer and convolutional neural networks, but does not specify versions for any programming languages or software libraries (e.g., Python, TensorFlow, PyTorch, scikit-learn). |
| Experiment Setup | Yes | Our model with the best validation performance involved training on 163840 examples, featurizing a grid of edge length 41 Å with voxel resolution of 1 Å (thus starting at a cube size of 41x41x41), and then applying 6 layers of convolution (each of size 3x3x3, with the 6 layers having 32, 32, 64, 64, 128, 128 convolutional filters, respectively) and 2 layers of max pooling... A fully connected layer with 512 parameters lays at the top of each tower, and the outputs of both towers are concatenated and passed through two more fully connected layers with 512 parameters each, leading to the final prediction. The number of filters used in each convolutional layer is doubled every other layer to allow for an increase of the specificity of the filters as the spatial resolution decreases. We use the RMSProp optimizer with a learning rate of 0.0001. The positive-negative class imbalance was set to 1:1. |