Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction

Authors: Hannes Stärk, Octavian Ganea, Lagnajit Pattanaik, Dr.Regina Barzilay, Tommi Jaakkola

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we investigate two settings: re-docking (i.e., taking the bound ligand structure out of a complex and asking the model to dock it) and flexible self-docking (i.e., ligands have no bound structure knowledge prior to docking). ... We provide a new time-based dataset split and preprocessing pipeline for DL drug binding methods... We use protein-ligand complexes from PDBBind... The results in Table 1 show that vanilla EQUIBIND performs well at identifying the approximate binding location and outperforms the baselines...
Researcher Affiliation	Academia	1Massachusetts Institute of Technology, MIT, Cambridge, MA, USA. Correspondence to: Hannes St ark <EMAIL>.
Pseudocode	No	The paper describes its model and methods in detail, including mathematical formulations and descriptions of its components (e.g., 'DISTANCE GEOMETRIC CONSTRAINTS' and 'FAST POINT CLOUD LIGAND FITTING'). However, it does not include a distinct block explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Code to reproduce results or perform fast docking with the provided model weights is available at https://github.com/Hannes Stark/Equi Bind.
Open Datasets	Yes	We use protein-ligand complexes from PDBBind (Liu et al., 2017), which is a subset of the Protein Data Bank (PDB) (Berman et al., 2003) that provides 3D structures of individual proteins and complexes. The newest version, PDBBind v2020, contains 19 443 protein-ligand complexes with 3890 unique receptors and 15 193 unique ligands. Histograms for individual receptor and ligand data frequencies are in Figure 16 and we describe our preprocessing to remove pathologies of the data in Appendix B. We make this data and associated scripts available at https://github.com/Hannes Stark/Equi Bind.
Dataset Splits	Yes	From the remaining complexes that are older than 2019, we remove those with ligands contained in the test set, giving 17 347 complexes for training and validation. These are divided into 968 validation complexes, which share no ligands with the remaining 16 379 train complexes.
Hardware Specification	Yes	We ran all runtime measurements on the same machine using 16 logical CPU cores (except for GLIDE, which does not support multithreading detailed in Appendix C), once with and once without access to a 6GB GTX 1060 GPU.
Software Dependencies	No	The paper mentions using the 'RDKit library (Landrum, 2016)' and processing with 'Open Babel (Open Babel development team, 2005)' and optimizing with 'Adam (Kingma & Ba, 2014)'. However, specific version numbers for RDKit, Open Babel, or other libraries are not provided.
Experiment Setup	Yes	We use a learning rate of 10 4 for EQUIBIND and 3 10 4 for EQUIBIND-R. The learning rate is reduced by a factor of 0.6 after 60 epochs of no improvement in our main validation criterion... All hyperparameters and the employed ligand and node features are described in Appendix C. Table 6 provides the search space for all EQUIBIND models including parameters like LAS DG STEP SIZE, PROPAGATION DEPTH, HIDDEN DIMENSION, LEARNING RATES, DROPOUT, etc.