Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Repurposing AlphaFold3-like Protein Folding Models for Antibody Sequence and Structure Co-design

Authors: Nianzu Yang, Songlin Jiang, Jian Ma, Huaijin Wu, Shuangjia Zheng, Wengong Jin, Junchi Yan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our benchmark results show that sequence-structure co-diffusion models not only surpass state-of-the-art antibody design methods in performance but also maintain structure prediction accuracy comparable to the original folding model. Notably, in the antibody co-design task, our method achieves a CDR-H3 recovery rate of 65% for typical antibodies, outperforming the baselines by 87%, and attains a remarkable 63% recovery rate for nanobodies. Our code is available at https://github.com/yangnianzu0515/MFDesign. 4 Experiments
Researcher Affiliation	Academia	1School of CS & School of AI, Shanghai Jiao Tong University 2Khoury College of Computer Sciences, Northeastern University 3School of AI & GIFT , Shanghai Jiao Tong University EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Sequence and Structure Co-diffusion Module: Co Diffusion Module Algorithm S1 SAb Dab Deduplication Algorithm S2 Cropping Strategy Adopted in Our 2nd & 3rd & 4th Training Phases
Open Source Code	Yes	Our code is available at https://github.com/yangnianzu0515/MFDesign.
Open Datasets	Yes	The data for evaluation comes from the Structural Antibody Database (SAb Dab) [Dunbar et al., 2014]. Boltz-1 itself is pre-trained on PDB structures released before the same cut-off date as Alpha Fold3, i.e., September 30, 2021.
Dataset Splits	Yes	These clusters are then divided into training, validation, and test sets in an 9:0.5:0.5 ratio, ensuring that all samples released before the cut-off date are included in the training set. This results in 5,843 training samples, 187 validation samples, and 204 test samples, with the test set comprising 161 regular antibodies and 43 nanobodies.
Hardware Specification	Yes	All experiments run on a single node consisting of 8 H100 GPUs with 80 GB HBM3 each (aggregated GPU memory of 640 GB), 2 Intel Xeon Platinum 8468 processors comprised of 48 CPUs each (total 96 cores, 192 threads).
Software Dependencies	No	The paper mentions several tools and models like 'Py Rosetta [Chaudhury et al., 2010]', 'MMSeq2 [Steinegger and Söding, 2017]', and 'Colab Fold [Mirdita et al., 2022]', but does not provide specific version numbers for these software components.
Experiment Setup	Yes	1st-stage: A warm-up strategy is employed, where the learning rate increases linearly from 0 to a predefined value 5 10 4 over 10 steps. Subsequent to this warm-up phase, the learning rate decays by a factor of 0.999 every 100 steps. The max token number is set to 256. The batch size for a single GPU is 6, thus amounting to 48 samples per training step. This stage consists of a total of 5, 000 training steps. 2nd-stage: Training continues for a total of 2, 000 steps, starting with an initial learning rate that is set directly. The learning rate then decays by a factor of 0.999 every 100 steps. The max token number is set to 384, and the batch size for a single GPU is set to 3, resulting in 24 samples per training step. 3rd-stage: Training continues for a total of 5, 000 steps, starting with an initial learning rate that is set directly. The learning rate then decays by a factor of 0.999 every 100 steps. The max token number is set to 512, and the batch size for a single GPU is set to 1, resulting in 8 samples per training step. 4th-stage: Training continues for a total of 9, 000 steps, starting with an initial learning rate that is set directly. The learning rate then decays by a factor of 0.999 every 100 steps. The max token number is set to 512, and the batch size for a single GPU is set to 1, resulting in 8 samples per training step.