Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DualMPNN: Harnessing Structural Alignments for High-Recovery Inverse Protein Folding
Authors: Xuhui Liao, qiyu wang, Zhiqiang Liang, Liwei Xiao, Junjie Chen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive evaluations across on CATH 4.2, TS50 and T500 benchmarks demonstrate Dual MPNN achieves state-of-the-art recovery rates of 65.51%, 70.99%, and 70.37%, significantly outperforming base model Protein MPNN by 15.64%, 16.56%, 12.29%, respectively. |
| Researcher Affiliation | Academia | Xuhui Liao, Qiyu Wang, Zhiqiang Liang, Liwei Xiao, Junjie Chen School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China EMAIL EMAIL |
| Pseudocode | No | The shared MPNN module employs a hierarchical encoder-decoder architecture with interleaved message passing between structural encoding and sequence decoding stages. Node Update: (1) Construct message function. Collect the node features from k neigh-bors hneigh V , where $ denotes stacking along the third dim and hneigh RN K d. Then expand the dims of h V for concatenation h V RN K d. Concatenates the features h EV = Cat[ h V, hneigh V , h E] and then constructs the message function m = MLP(h EV ) RN K d. (2) Passing step of messages. Utilizing the aggregated message to update nodes by the following functions: |
| Open Source Code | Yes | The code is available at https://github.com/chen-bioinfo/Dual MPNN. |
| Open Datasets | Yes | We trained and evaluated Dual MPNN on CATH, following the standardized data partition from prior work Graph Trans [17] and Gra De-IF [13]. ... Additionally, we tested our model on the T500 and TS50 datasets introduced by Dense CPD[29], which includes 9,888 structures for training and two distinct test datasets containing 50 (TS50) and 500 (T500) structures, respectively. |
| Dataset Splits | Yes | We trained and evaluated Dual MPNN on CATH, following the standardized data partition from prior work Graph Trans [17] and Gra De-IF [13]. The proteins are categorized into a division of 18,024 proteins for training, 608 for validation, and 1,120 for testing. ... Additionally, we tested our model on the T500 and TS50 datasets introduced by Dense CPD[29], which includes 9,888 structures for training and two distinct test datasets containing 50 (TS50) and 500 (T500) structures, respectively. |
| Hardware Specification | No | Question: For each experiment, does the paper provide sufficient information on the com- puter resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [No] Justification: |
| Software Dependencies | No | In this study, we employed Protein MPNN as the base model. ... Homologous templates are identified using Foldseek[27] against the Protein Data Bank (PDB). ... is structurally aligned to query structure via TM-align [28] ... We utilize Foldseek[27] to perform multiple structural alignments for a given query protein. ... the learning rate is scheduled by the Adam optimizer. ... we fold them with Alpha Fold2[15] and Alphafold3[35]. |
| Experiment Setup | Yes | The MPNN blocks possess a hidden dimension of 128 for the node and edge projections. The number of neighbors for each aggregated node is 48 in the query MPNN and 4 in the template MPNN. The interactive attention layer shares the same hidden dimension as the MPNN block. In addition, we utilize a dropout rate of 0.1 to avoid overfitting both in the MPNN block and in the attention layer. The model is trained on 40 epochs, and the learning rate is scheduled by the Adam optimizer. |