Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DualMPNN: Harnessing Structural Alignments for High-Recovery Inverse Protein Folding

Authors: Xuhui Liao, qiyu wang, Zhiqiang Liang, Liwei Xiao, Junjie Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive evaluations across on CATH 4.2, TS50 and T500 benchmarks demonstrate Dual MPNN achieves state-of-the-art recovery rates of 65.51%, 70.99%, and 70.37%, signiﬁcantly outperforming base model Protein MPNN by 15.64%, 16.56%, 12.29%, respectively.
Researcher Affiliation	Academia	Xuhui Liao, Qiyu Wang, Zhiqiang Liang, Liwei Xiao, Junjie Chen School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China EMAIL EMAIL
Pseudocode	No	The shared MPNN module employs a hierarchical encoder-decoder architecture with interleaved message passing between structural encoding and sequence decoding stages. Node Update: (1) Construct message function. Collect the node features from k neigh-bors hneigh V , where $ denotes stacking along the third dim and hneigh RN K d. Then expand the dims of h V for concatenation h V RN K d. Concatenates the features h EV = Cat[ h V, hneigh V , h E] and then constructs the message function m = MLP(h EV ) RN K d. (2) Passing step of messages. Utilizing the aggregated message to update nodes by the following functions:
Open Source Code	Yes	The code is available at https://github.com/chen-bioinfo/Dual MPNN.
Open Datasets	Yes	We trained and evaluated Dual MPNN on CATH, following the standardized data partition from prior work Graph Trans [17] and Gra De-IF [13]. ... Additionally, we tested our model on the T500 and TS50 datasets introduced by Dense CPD[29], which includes 9,888 structures for training and two distinct test datasets containing 50 (TS50) and 500 (T500) structures, respectively.
Dataset Splits	Yes	We trained and evaluated Dual MPNN on CATH, following the standardized data partition from prior work Graph Trans [17] and Gra De-IF [13]. The proteins are categorized into a division of 18,024 proteins for training, 608 for validation, and 1,120 for testing. ... Additionally, we tested our model on the T500 and TS50 datasets introduced by Dense CPD[29], which includes 9,888 structures for training and two distinct test datasets containing 50 (TS50) and 500 (T500) structures, respectively.
Hardware Specification	No	Question: For each experiment, does the paper provide sufﬁcient information on the com- puter resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [No] Justiﬁcation:
Software Dependencies	No	In this study, we employed Protein MPNN as the base model. ... Homologous templates are identiﬁed using Foldseek[27] against the Protein Data Bank (PDB). ... is structurally aligned to query structure via TM-align [28] ... We utilize Foldseek[27] to perform multiple structural alignments for a given query protein. ... the learning rate is scheduled by the Adam optimizer. ... we fold them with Alpha Fold2[15] and Alphafold3[35].
Experiment Setup	Yes	The MPNN blocks possess a hidden dimension of 128 for the node and edge projections. The number of neighbors for each aggregated node is 48 in the query MPNN and 4 in the template MPNN. The interactive attention layer shares the same hidden dimension as the MPNN block. In addition, we utilize a dropout rate of 0.1 to avoid overﬁtting both in the MPNN block and in the attention layer. The model is trained on 40 epochs, and the learning rate is scheduled by the Adam optimizer.