Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

NeuralPLexer3: Accurate Biomolecular Complex Structure Prediction with Flow Models

Authors: Jarren Zhuoran Qiao, Feizhi Ding, Thomas Dresselhaus, Mia Rosenfeld, Xiaotian Han, Owen Howell, Aniketh Iyengar, Stephen Opalenski, Anders Christensen, Sai Krishna Sirumalla, Fred Manby, Thomas K. Miller, Matthew Welborn

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Examined through existing and new benchmarks, Neural PLexer3 excels in areas crucial to structure-based drug design, including blind docking, physical validity, and ligand-induced protein conformational changes. On the Pose Busters benchmark [9], NP3 achieves state-of-the-art accuracy in predicting proteinligand complexes.
Researcher Affiliation	Industry	Zhuoran Qiao Iambic Therapeutics San Diego, CA 92121
Pseudocode	Yes	More details on model training and inference can be found in Algorithms S2 and S1. Altogether, these contributions led to two main advantages: (1) a substantial reduction in the number of integrator steps needed to sample from the model, leading to improved inference efficiency; and (2) alleviating the need for expensive diffusion rollouts [1] before each optimizer step, which both speeds up and simplifies the procedure for training the main model and confidence modules. Algorithm S1 Sampling from NP3 with symmetry-corrected flow. Algorithm S2 NP3 main model training iteration. Algorithm S3 Sampling from a Globular Polymer Prior via Short Langevin Dynamics
Open Source Code	No	Code for the NPBench benchmark is available on Git Hub at https://github.com/iambic-therapeutics/np-bench. NPBench reference structures and corresponding model inferences are available on Zenodo at https://zenodo.org/records/14503936. Justification: The model code is not available. The code and data corresponding to new benchmarks introduced in this work are publicly available. We provide details necessary to reproduce the models in this work in the Supplementary Information.
Open Datasets	Yes	Code for the NPBench benchmark is available on Git Hub at https://github.com/iambic-therapeutics/np-bench. NPBench reference structures and corresponding model inferences are available on Zenodo at https://zenodo.org/records/14503936. NPBench comprises 1,143 chains or interfaces derived from low-homology, high-resolution, and deduplicated PDB structures released after 2023.
Dataset Splits	Yes	NP3 is trained on all PDB structures deposited before September 1, 2020, along with additional synthetic datasets. The cutoff date is chosen to ensure no overlap with evaluation set structures. We filter to non-NMR entries with resolution better than 4.5A. To choose the most representative sample from these candidates with strong protection against overlap from training set, we perform a PDB clustering on chain/ligand/interface index for all structures released before 2023-01-12, with a maximum similarity cutoff of 40% sequence identity for polymers and 0.6 Tanimoto similarity for ligands and PTMs.
Hardware Specification	Yes	The NP3 production model was trained on 56 H100 GPUs hosted on a cluster that implements the NVIDIA NCP Reference Architecture for 24 days. Training is carried out using Py Torch FSDP [39] under BF16 automatic mixed precision. All inference results used either: one H100 GPU hosted on a cluster that implements the NVIDIA NCP Reference Architecture, or one L40S GPU hosted on an AWS g6e.xlarge instance.
Software Dependencies	Yes	Training is carried out using Py Torch FSDP [39] under BF16 automatic mixed precision. We use Py MOL [41] to align the selected Cα atoms and all ligand atoms in the predicted structure to those in the ground-truth structure with zero refinement cycles. The pocket-aligned ligand RMSD is computed using RDKit [42] Calc RMS between the aligned ligand and the reference ligand structures. [39] Ansel, J. et al. Py Torch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation in Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 2 (Association for Computing Machinery, New York, NY, USA, Apr. 2024), 929 947. ISBN: 9798400703850. [42] Landrum, G. RDKit: Open-Source Cheminformatics Software. https://github.com/rdkit/rdkit/releases/tag/Release_2024_09_1 (2024).
Experiment Setup	Yes	The NP3 production model was trained on 56 H100 GPUs hosted on a cluster that implements the NVIDIA NCP Reference Architecture for 24 days. Training is carried out using Py Torch FSDP [39] under BF16 automatic mixed precision. Over the full course of model training, we progressively scale the structure crop size to gradually capture longer contexts. The number of decoder replicas P is tuned for each stage: Maximum cropping size for first 80% training iterations: 384 anchors or 3072 atoms, P=32; For 10% of training iterations: 512 anchors or 4096 atoms, P=32; For 6% of training iterations: 640 anchors or 6400 atoms, P=24; For the last 4% of training iterations: 1024 anchors or 15360 atoms, P=8. Atom weight = 10.0 for the ligand of interest, when the training structure is generated based on a contiguous or spatial crop around a ligand-polymer interface; Atom weight = 1.0 for Cα and C1 atoms, and all rest ligand atoms; Atom weight = 0.0 for all side-chain atoms.