Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Joint Design of Protein Surface and Backbone Using a Diffusion Bridge Model

Authors: Guanlue Li, Xufeng Zhao, Fang Wu, Sören Laue

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive validation across diverse protein design scenarios demonstrates Pep Bridge s efficacy in generating structurally viable proteins, representing a significant advancement in the joint design of top-down protein structure. This section presents comprehensive experimental evaluations to demonstrate the efficacy of our proposed method.
Researcher Affiliation	Academia	Guanlue Li University of Hamburg Hamburg, Germany EMAIL Xufeng Zhao University of Hamburg Hamburg, Germany EMAIL Fang Wu Stanford University Stanford, CA, USA EMAIL Sören Laue University of Hamburg Hamburg, Germany EMAIL
Pseudocode	No	The paper describes its methodology through mathematical formulations and textual explanations within sections like 'Surface Diffusion Bridge' and 'Bottom Structure Diffusion Generation' but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code can be found at https://github.com/guanlueli/Pepbridge. We provide an anonymous link to the code and also include it in the supplementary materials.
Open Datasets	Yes	The evaluation utilized the Pep Merge dataset [25], a collection derived from the integration of Pep BDB [52] and Q-Bio Lip [51] databases.
Dataset Splits	Yes	The filtered dataset underwent sequence-based clustering using MMseqs2 [45], resulting in 9,816 protein-peptide complexes organized into 292 distinct clusters. For systematic evaluation, we designated 10 clusters encompassing 158 complexes as the test set, with the remaining complexes allocated to training and validation cohorts.
Hardware Specification	Yes	The experiments were conducted on a computing cluster with 2 NVIDIA RTX A6000, each with 48 GB of memory.
Software Dependencies	No	The paper mentions software like Py Mol [10] and MMseqs2 [45] but does not provide specific version numbers for these or any other key software dependencies required for reproducibility.
Experiment Setup	Yes	The total computation time for training was approximately 21 hours. We trained for 900000 steps with batch size 8. We used the Adam optimizer with a start learning of 5e-4. We also schedule to decay the learning rate exponentially with a factor of 0.6 and a minimum learning rate of 1e-6. The learning rate is decayed if there is no improvement for the validation loss in 10 consecutive evaluations.