Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Accurately Predicting Protein Mutational Effects via a Hierarchical Many-Body Attention Network

Authors: Dahao Xu, Jiahua Rao, Mingming Zhu, Jixian Zhang, Wei Lu, Shuangjia Zheng, Yuedong Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate state-of-the-art performance on multiple benchmarks. On the SKEMPI v2 dataset, H3-DDG achieves a Pearson correlation of 0.75, improving multi-point mutations prediction by 12.10%. On the challenging Binding GYM dataset, it outperforms Prompt-DDG and BA-DDG by 62.61% and 34.26%, respectively. Ablation and efficiency analyses demonstrate its robustness and scalability, while a case study on SARS-Co V-2 antibodies highlights its practical value in improving binding affinity for therapeutic design.
Researcher Affiliation	Collaboration	Dahao Xu1, , Jiahua Rao1, , Mingming Zhu1, Jixian Zhang2, Wei Lu2, Shuangjia Zheng3, , Yuedong Yang1, Equal Contribution Corresponding Authors 1Sun Yat-sen University 2Aureka Biotechnologies 3Shanghai Jiao Tong University EMAIL EMAIL
Pseudocode	No	The paper describes the methodology using textual explanations and mathematical equations in Section 3, accompanied by diagrams in Figure 1, but it does not include a distinct pseudocode block or algorithm section.
Open Source Code	Yes	The code is available at https://github.com/biomed-AI/H3-DDG.
Open Datasets	Yes	Datasets. We used SKEMPI v2 [17], a benchmark with 7,085 mutations across 348 protein complexes, to evaluate G prediction. Additionally, we evaluated on Binding GYM [24], the largest dataset for protein-protein interactions, with 508,962 curated entries and a high proportion of multi-point mutations.
Dataset Splits	Yes	On the SKEMPI v2 dataset, following prior work [26, 37], we split the data into three non-overlapping folds by complex to avoid data leakage. To evaluate generalization to unseen protein-protein interactions, we adopt the inter-assay split strategy from the Binding GYM dataset, following the approach of [24]. In this setting, assays are first clustered into five groups based on the sequences of their mutated proteins. Data from one cluster is held out for testing, while the remaining four are used for training.
Hardware Specification	Yes	Experiments were run on dual Xeon Gold 6248R CPUs and an RTX 4090 GPU under Ubuntu 22.04.
Software Dependencies	No	The paper mentions using the Adam optimizer and the Protein MPNN module, but does not provide specific version numbers for these or any other software libraries or environments.
Experiment Setup	Yes	We used the Adam optimizer with a learning rate of 4e-4 and a batch size of 1, 2, depending on GPU memory and graph size. The model was trained for 20,000 iterations with 4 attention heads and a hidden dimension of 128. The number of hyperedges was selected from L/10, L/6, L/4, and the number of edges in the 4-body attention module from 1N, 2N, 3N, where L and N denote the numbers of residues and nodes, respectively. The pre-trained Protein MPNN module used its default 3-layer configuration.