Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ShEPhERD: Diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design

Authors: Keir Adams, Kento Abeywardane, Jenna Fromer, Connor Coley

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train and evaluate Sh EPh ERD using two new datasets. Our first dataset (Sh EPh ERD-GDB17) contains 2.8M molecules sampled from medicinally-relevant subsets of GDB17... Our second dataset (Sh EPh ERD-MOSES-aq) contains 1.6M drug-like molecules from MOSES... We demonstrate Sh EPh ERD s potential for impact via exemplary drug design tasks including natural product ligand hopping, protein-blind bioactive hit diversification, and bioisosteric fragment merging.
Researcher Affiliation	Academia	Keir Adams , Kento Abeywardane , Jenna Fromer, & Connor W. Coley Massachusetts Institute of Technology, Cambridge, MA 02139, USA EMAIL
Pseudocode	Yes	Algorithm 1 Denoising Module for x1 Algorithm 2 Forward Pass of Sh EPh ERD s Denoising Network Algorithm 3 Training Algorithm Algorithm 4 Sampling Algorithm for Unconditional Generation Algorithm 5 Sampling Algorithm for Conditional Generation with Inpainting
Open Source Code	Yes	To ensure reproducibility, we make our datasets and all training, inference, and evaluation code available on Github at https: //github.com/coleygroup/shepherd and https://github.com/coleygroup/ shepherd-score.
Open Datasets	Yes	To ensure reproducibility, we make our datasets and all training, inference, and evaluation code available on Github at https: //github.com/coleygroup/shepherd and https://github.com/coleygroup/ shepherd-score. ... Our first dataset (Sh EPh ERD-GDB17) contains 2.8M molecules sampled from medicinally-relevant subsets of GDB17 (Ruddigkeit et al., 2012; Awale et al., 2019; B uhlmann & Reymond, 2020). Our second dataset (Sh EPh ERDMOSES-aq) contains 1.6M drug-like molecules from MOSES (Polykovskiy et al., 2020)
Dataset Splits	No	The paper mentions evaluating on '100 random target molecules (held out from training)' for specific experiments, but it does not provide specific proportions or methodologies for the train/test/validation splits of its main datasets (Sh EPh ERD-GDB17 and Sh EPh ERD-MOSES-aq) in the main text or appendix.
Hardware Specification	Yes	Training. We train all models with V100 GPUs (32 GB memory). ...Inference. For each of the P(x1, x2), P(x1, x3), and P(x1, x3, x4) models, generating a batch of 10 independent samples (either unconditionally or via inpainting) takes approximately 3-4 minutes on a V100 GPU
Software Dependencies	Yes	To evaluate their likelihood of retaining bioactivity, we use Autodock Vina (Trott & Olson, 2010; Eberhardt et al., 2021) to dock the generated ligands...dock the molecule with Autodock Vina v1.2.5... Auto Dock Vina (v1.1.2)
Experiment Setup	Yes	We train Sh EPh ERD with the Adam optimizer using a constant learning rate of 3e-4 and an effective batch size ranging from 40 to 48. We clip gradients that have norm exceeding 5.0. ... We use T = 400 for all Sh EPh ERD models. Table 7 lists hyperparameters relevant to training Sh EPh ERD.