Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Physics Aware Neural Networks for Unsupervised Binding Energy Prediction

Authors: Ke Liu, Hao Chen, Chunhua Shen

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted on the unsupervised protein-ligand binding energy prediction benchmarks, comparing them with previous works. Empirical results and theoretic analysis demonstrate that CEBind is more efficient and outperforms previous unsupervised models on benchmarks.
Researcher Affiliation	Academia	1Zhejiang University, Hangzhou, China. Correspondence to: Hao Chen <EMAIL>.
Pseudocode	Yes	Algorithm 1 Training procedure (single data point) Algorithm 2 Training procedure of CEBind (single data point) Algorithm 3 Training procedure of DSMBind (single data point)
Open Source Code	No	The paper does not contain any explicit statements about releasing code, nor does it provide a link to a code repository.
Open Datasets	Yes	The protein-small molecule dataset contains 4806 protein-ligand complexes from PDBbind V2020 database for training (Su et al., 2018), 357 complexes randomly sampled from PDBbind in (St ark et al., 2022) for evaluation, and 258 complexes from the PDBbind core set with labels of binding energy (Su et al., 2018) for test. The antibody-antigen dataset includes 3416 antibody-antigen complexes from the structural antibody database (SAb Dab) (Schneider et al., 2022) for training, 116 complexes from CSM-sb (Myung et al., 2022) for evaluation, and 566 complexes with labels of binding affinity from SAb Dab for test.
Dataset Splits	Yes	The protein-small molecule dataset contains 4806 protein-ligand complexes from PDBbind V2020 database for training (Su et al., 2018), 357 complexes randomly sampled from PDBbind in (St ark et al., 2022) for evaluation, and 258 complexes from the PDBbind core set with labels of binding energy (Su et al., 2018) for test. The antibody-antigen dataset includes 3416 antibody-antigen complexes from the structural antibody database (SAb Dab) (Schneider et al., 2022) for training, 116 complexes from CSM-sb (Myung et al., 2022) for evaluation, and 566 complexes with labels of binding affinity from SAb Dab for test.
Hardware Specification	Yes	All our experiments are conducted on a computing cluster with 8 GPUs of NVIDIA Ge Force RTX 4090 24GB and CPUs of AMD EPYC 7763 64-Core of 3.52GHz. All the inferences are conducted on a single GPU of NVIDIA Ge Force RTX 4090 24GB.
Software Dependencies	Yes	We use the pre-trained ESM of version esm2 t36 3B UR50D for protein residue embedding. We use the SRU (Lei et al., 2017) as our protein-ligand interaction modeling model following DSMBind.
Experiment Setup	Yes	We train CEBind, Gauss DSMBind, and DSMBind for 10 epochs. We train all the models with the same hyperparameters following DSMBind (Jin et al., 2024). The batch size, learning rate, and hidden vector size are 4, 1e-3, and 256, respectively. We assign the duration of t as a random number from 0 to 1.