Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Venus-MAXWELL: Efficient Learning of Protein-Mutation Stability Landscapes using Protein Language Models

Authors: Yuanxi Yu, Fan Jiang, Xinzhu Ma, Liang Zhang, Bozitao Zhong, Wanli Ouyang, Guisheng Fan, Huiqun Yu, Liang Hong, Mingchen Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated Venus-MAXWELL s performance on the Test12K dataset (Section 4), comprising 12,443 mutations across 308 diverse proteins. As shown in Figure 4 (A), Venus-MAXWELL enhances zero-shot performance across multiple PLMs, achieving an average Spearman correlation improvement of 0.143 (Wilcoxon signed-rank test, p < 0.01). Table 1: Efficiency comparison of stability prediction methods. Table 2: Ablation studies of Venus-MAXWELL (ESM-IF) on Test12K datasets
Researcher Affiliation	Academia	1 Institute of Natural Sciences, Shanghai Jiao Tong University, China 2 Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, China 3 Shanghai Artificial Intelligence Laboratory, China 4 School of Information Science and Engineering, East China University of Science and Technology, China Corresponding authors: Mingchen Li (EMAIL) and Liang Hong (EMAIL); These authors contributed equally to this work.
Pseudocode	Yes	We also provide the algorithm and pseudo-code of Venus-MAXWELL for training and landscape prediction in Appendix A.1 and Appendix A.2. Algorithm 1 outlines the training procedure for Venus-MAXWELL. The process iterates through the dataset, computing the predicted mutation landscape Li from the PLM s log-probabilities for each protein Ai. The model parameters θ are then updated using gradient descent based on a combined loss function that incorporates both ranking (LRanking) and absolute value (LMSE) objectives. Algorithm 2 details the prediction process using a PLM fine-tuned by Venus-MAXWELL. Given a protein sequence A, the model performs a single forward pass to compute the log-probability matrix and subsequently derives the mutation G landscape L based on Equation 8. Algorithm 3 provides a Pytorch-like pseudocode implementation of the core training loop for Venus MAXWELL. It illustrates how input sequences are processed, logits are computed, and transformed into the mutation landscape matrix, and how the ranking and MSE losses are calculated and combined for backpropagation and parameter updates.
Open Source Code	Yes	The training codes, model weights, and datasets are publicly available at https://github.com/ai4protein/Venus-MAXWELL.
Open Datasets	Yes	Besides, to facilitate future works, we also curated a large-scale G dataset with strict controls on data leakage and redundancy to ensure robust evaluation. The training codes, model weights, and datasets are publicly available at https://github.com/ai4protein/Venus-MAXWELL.
Dataset Splits	Yes	As a result, we obtained strictly separated training and test sets, containing over 226K and 12K curated mutation entries, respectively, ensuring a fair evaluation of generalization. Test set. The test set is a compilation of mutation G data that we are currently able to collect, including P53 [17], Myoglobin [17], SSym [17], S669 [41], S8754 [42], M1261 [42], vb1432 [43], Fireprotdb [44] and Thermomutdb [45]. ... After removing duplicates, the test set contains 12,443 mutations across 308 proteins, namely 308 sparse mutation G landscape. This dataset, named Test12K... Training Set. The training set is derived from a large-scale dataset containing 272K protein mutation sequences, denoted as c DNA272K [46]...The resulting training set, Train226K, consists of 226K sequences with less than 30% sequence similarity to any sequences in the test set. All key hyperparameters were selected through a rigorous 5-fold cross-validation procedure on the training set (Train226K) to prevent test set leakage.
Hardware Specification	Yes	All experiments were conducted on a PC with an NVIDIA RTX 4090 GPU.
Software Dependencies	No	The paper mentions "Algorithm 3 provides a Pytorch-like pseudocode implementation" and "For all Venus-MAXWELL enhanced models, we utilized the Adam optimizer [54]", which implies PyTorch and Adam optimizer are used. However, it does not specify explicit version numbers for these software components or any other libraries.
Experiment Setup	Yes	For all Venus-MAXWELL enhanced models, we utilized the Adam optimizer [54]. All key hyperparameters were selected through a rigorous 5-fold cross-validation procedure on the training set (Train226K) to prevent test set leakage. Based on this analysis, we set the initial learning rate to 5 10 5, the loss weighting factor λ to 0.1, and the MLP hidden dimension D to V (the PLM s vocabulary size). The optimal number of training epochs (approximately 7, determined via early stopping with a patience of 5 epochs) was also identified during this crossvalidation process.