Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Venus-MAXWELL: Efficient Learning of Protein-Mutation Stability Landscapes using Protein Language Models
Authors: Yuanxi Yu, Fan Jiang, Xinzhu Ma, Liang Zhang, Bozitao Zhong, Wanli Ouyang, Guisheng Fan, Huiqun Yu, Liang Hong, Mingchen Li
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated Venus-MAXWELL s performance on the Test12K dataset (Section 4), comprising 12,443 mutations across 308 diverse proteins. As shown in Figure 4 (A), Venus-MAXWELL enhances zero-shot performance across multiple PLMs, achieving an average Spearman correlation improvement of 0.143 (Wilcoxon signed-rank test, p < 0.01). Table 1: Efficiency comparison of stability prediction methods. Table 2: Ablation studies of Venus-MAXWELL (ESM-IF) on Test12K datasets |
| Researcher Affiliation | Academia | 1 Institute of Natural Sciences, Shanghai Jiao Tong University, China 2 Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, China 3 Shanghai Artificial Intelligence Laboratory, China 4 School of Information Science and Engineering, East China University of Science and Technology, China Corresponding authors: Mingchen Li (EMAIL) and Liang Hong (EMAIL); These authors contributed equally to this work. |
| Pseudocode | Yes | We also provide the algorithm and pseudo-code of Venus-MAXWELL for training and landscape prediction in Appendix A.1 and Appendix A.2. Algorithm 1 outlines the training procedure for Venus-MAXWELL. The process iterates through the dataset, computing the predicted mutation landscape Li from the PLM s log-probabilities for each protein Ai. The model parameters θ are then updated using gradient descent based on a combined loss function that incorporates both ranking (LRanking) and absolute value (LMSE) objectives. Algorithm 2 details the prediction process using a PLM fine-tuned by Venus-MAXWELL. Given a protein sequence A, the model performs a single forward pass to compute the log-probability matrix and subsequently derives the mutation G landscape L based on Equation 8. Algorithm 3 provides a Pytorch-like pseudocode implementation of the core training loop for Venus MAXWELL. It illustrates how input sequences are processed, logits are computed, and transformed into the mutation landscape matrix, and how the ranking and MSE losses are calculated and combined for backpropagation and parameter updates. |
| Open Source Code | Yes | The training codes, model weights, and datasets are publicly available at https://github.com/ai4protein/Venus-MAXWELL. |
| Open Datasets | Yes | Besides, to facilitate future works, we also curated a large-scale G dataset with strict controls on data leakage and redundancy to ensure robust evaluation. The training codes, model weights, and datasets are publicly available at https://github.com/ai4protein/Venus-MAXWELL. |
| Dataset Splits | Yes | As a result, we obtained strictly separated training and test sets, containing over 226K and 12K curated mutation entries, respectively, ensuring a fair evaluation of generalization. Test set. The test set is a compilation of mutation G data that we are currently able to collect, including P53 [17], Myoglobin [17], SSym [17], S669 [41], S8754 [42], M1261 [42], vb1432 [43], Fireprotdb [44] and Thermomutdb [45]. ... After removing duplicates, the test set contains 12,443 mutations across 308 proteins, namely 308 sparse mutation G landscape. This dataset, named Test12K... Training Set. The training set is derived from a large-scale dataset containing 272K protein mutation sequences, denoted as c DNA272K [46]...The resulting training set, Train226K, consists of 226K sequences with less than 30% sequence similarity to any sequences in the test set. All key hyperparameters were selected through a rigorous 5-fold cross-validation procedure on the training set (Train226K) to prevent test set leakage. |
| Hardware Specification | Yes | All experiments were conducted on a PC with an NVIDIA RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions "Algorithm 3 provides a Pytorch-like pseudocode implementation" and "For all Venus-MAXWELL enhanced models, we utilized the Adam optimizer [54]", which implies PyTorch and Adam optimizer are used. However, it does not specify explicit version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | For all Venus-MAXWELL enhanced models, we utilized the Adam optimizer [54]. All key hyperparameters were selected through a rigorous 5-fold cross-validation procedure on the training set (Train226K) to prevent test set leakage. Based on this analysis, we set the initial learning rate to 5 10 5, the loss weighting factor λ to 0.1, and the MLP hidden dimension D to V (the PLM s vocabulary size). The optimal number of training epochs (approximately 7, determined via early stopping with a patience of 5 epochs) was also identified during this crossvalidation process. |