Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ForceFM: Enhancing Protein-Ligand Predictions through Force-Guided Flow Matching

Authors: HUANLEI GUO, Song LIU, Bingyi Jing

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the PDBBind dataset demonstrate that our model outperforms existing methods in both docking accuracy and physical plausibility, consistently generating more realistic ligand poses. Moreover, we evaluate the generalization of our approach across multiple energy functions, confirming its broader applicability in various docking scenarios.
Researcher Affiliation	Academia	1 Department of Statistics and Data Science, Southern University of Science and Technology 2 School of Artificial Intelligence, The Chinese University of Hong Kong, Shenzhen 3 Shenzhen Loop Area Institute Corresponding authors: Bingyi Jing (EMAIL).
Pseudocode	Yes	Algorithm 1: Training Procedure (Single Epoch) for Force Model
Open Source Code	Yes	Code is available at https://github.com/Guhuary/Force FM.
Open Datasets	Yes	Dataset: We utilized protein-ligand complexes from PDBBind, originating from the Protein Data Bank (PDB) [35].
Dataset Splits	Yes	Adopting the time-based splitting in [6], we trained our model on 17k complexes up to 2018 and tested on 363 structures in 2019, ensuring no ligand overlaps.
Hardware Specification	Yes	All statistics are averaged over three random seeds. We implemented all the models using the open-source Python library Py Torch and e3nn [37], and the experiments were conducted on a PC equipped with 4 NVIDIA A100-40GB GPUs.
Software Dependencies	No	We implemented all the models using the open-source Python library Py Torch and e3nn [37], and the experiments were conducted on a PC equipped with 4 NVIDIA A100-40GB GPUs.
Experiment Setup	Yes	Both baseline model and force model employ the Adam W optimizer with a learning rate of 5e 5 and weight decay 1e 4. The learning rate is controlled by cosine annealing scheduler with minimum lr=1e 6. During inference, we use the exponential moving average of weights, updated after each optimization step with a decay factor of 0.999. [...] Training Duration: The model is trained over 2000 epochs on the PDBBind dataset, with a batch size of 8 and a dropout rate of 0.1. [...] Hyper-parameters: The inverse of the temperature k = 1 10, η = 1. The number of samples K for expectation approximation in Eqn.(12) is 40.