Unsupervised Protein-Ligand Binding Energy Prediction via Neural Euler's Rotation Equation
Authors: Wengong Jin, Siranush Sarkizova, Xun Chen, Nir HaCohen, Caroline Uhler
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate NERE on two protein-ligand and antibody-antigen binding affinity benchmarks from PDBBind [36] and Structural Antibody Database (SAb Dab) [31]. We compare NERE with unsupervised physics-based models like MM/GBSA [21], protein language models (ESM-1v [20] and ESM-IF [11]), and a variety of supervised models regressed on experimental binding affinity data. To simulate real-world virtual screening scenarios, we consider two settings where input complexes are crystallized or predicted by a docking software. NERE outperforms all unsupervised baselines across all settings and surpasses supervised models in the antibody setting, which highlights the benefit of unsupervised learning when binding affinity data is limited. |
| Researcher Affiliation | Collaboration | Wengong Jin, Siranush Sarzikova, Xun Chen, Nir Hacohen, Caroline Uhler Broad Institute of MIT and Harvard {wjin,sarkizov,xun,nhacohen,cuhler}@broadinstitute.org |
| Pseudocode | Yes | Algorithm 1 Training procedure (single data point) |
| Open Source Code | Yes | Our code and data are available at github.com/wengong-jin/DSMBind. |
| Open Datasets | Yes | Our training data has 5237 protein-ligand complexes from the refined subset of PDBbind v2020 database [36]. Our training and test data come from the Structural Antibody Database (SAb Dab) [31], which contains 4883 non-redundant antibody-antigen complexes. |
| Dataset Splits | Yes | Our training data has 5237 protein-ligand complexes from the refined subset of PDBbind v2020 database [36]. Our test set has 285 complexes from the PDBbind core set with binding affinity labels converted into log scale. Our validation set has 357 complexes randomly sampled from PDBbind by Stärk et al. [35] after excluding all test cases. |
| Hardware Specification | No | The paper mentions a '64-core CPU server' in the context of a baseline method (MM/GBSA) but does not provide specific model numbers, brands, or detailed specifications for the hardware used for their own experiments. |
| Software Dependencies | No | The paper describes using specific software like 'Autodock Vina' and 'ZDOCK program' and refers to the default hyperparameters from 'Yang et al. [42]' for the MPN encoder, but it does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | For Autodock Vina, we use its default docking parameters with docking grid dimension of 20Å, grid interval of 0.375Å, and exhaustiveness of 32. For ZDOCK, we mark antibody CDR residues as ligand binding site and generate 2000 poses for each antibody-antigen pair. We re-score those 2000 poses by ZRANK2 and select the best candidate. For the protein encoder, we set hidden layer dimension to be 256 and try encoder depth L {1, 2, 3} and distance threshold d {5.0, 10.0}. In the antibody case, we try encoder depth from L {1, 2, 3} and distance threshold d {10.0, 20.0}. |