EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction

Authors: Hannes Stärk, Octavian Ganea, Lagnajit Pattanaik, Dr.Regina Barzilay, Tommi Jaakkola

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we investigate two settings: re-docking (i.e., taking the bound ligand structure out of a complex and asking the model to dock it) and flexible self-docking (i.e., ligands have no bound structure knowledge prior to docking). ... We provide a new time-based dataset split and preprocessing pipeline for DL drug binding methods... We use protein-ligand complexes from PDBBind... The results in Table 1 show that vanilla EQUIBIND performs well at identifying the approximate binding location and outperforms the baselines...
Researcher Affiliation Academia 1Massachusetts Institute of Technology, MIT, Cambridge, MA, USA. Correspondence to: Hannes St ark <hstark@mit.edu>.
Pseudocode No The paper describes its model and methods in detail, including mathematical formulations and descriptions of its components (e.g., 'DISTANCE GEOMETRIC CONSTRAINTS' and 'FAST POINT CLOUD LIGAND FITTING'). However, it does not include a distinct block explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Code to reproduce results or perform fast docking with the provided model weights is available at https://github.com/Hannes Stark/Equi Bind.
Open Datasets Yes We use protein-ligand complexes from PDBBind (Liu et al., 2017), which is a subset of the Protein Data Bank (PDB) (Berman et al., 2003) that provides 3D structures of individual proteins and complexes. The newest version, PDBBind v2020, contains 19 443 protein-ligand complexes with 3890 unique receptors and 15 193 unique ligands. Histograms for individual receptor and ligand data frequencies are in Figure 16 and we describe our preprocessing to remove pathologies of the data in Appendix B. We make this data and associated scripts available at https://github.com/Hannes Stark/Equi Bind.
Dataset Splits Yes From the remaining complexes that are older than 2019, we remove those with ligands contained in the test set, giving 17 347 complexes for training and validation. These are divided into 968 validation complexes, which share no ligands with the remaining 16 379 train complexes.
Hardware Specification Yes We ran all runtime measurements on the same machine using 16 logical CPU cores (except for GLIDE, which does not support multithreading detailed in Appendix C), once with and once without access to a 6GB GTX 1060 GPU.
Software Dependencies No The paper mentions using the 'RDKit library (Landrum, 2016)' and processing with 'Open Babel (Open Babel development team, 2005)' and optimizing with 'Adam (Kingma & Ba, 2014)'. However, specific version numbers for RDKit, Open Babel, or other libraries are not provided.
Experiment Setup Yes We use a learning rate of 10 4 for EQUIBIND and 3 10 4 for EQUIBIND-R. The learning rate is reduced by a factor of 0.6 after 60 epochs of no improvement in our main validation criterion... All hyperparameters and the employed ligand and node features are described in Appendix C. Table 6 provides the search space for all EQUIBIND models including parameters like LAS DG STEP SIZE, PROPAGATION DEPTH, HIDDEN DIMENSION, LEARNING RATES, DROPOUT, etc.