DePLM: Denoising Protein Language Models for Property Optimization
Authors: Zeyuan Wang, Keyan Ding, Ming Qin, Xiaotong Li, Xiang Zhuang, Yu Zhao, Jianhua Yao, Qiang Zhang, Huajun Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we extensively evaluate De PLM across various datasets and demonstrate its superior performance and robust generalization capabilities. |
| Researcher Affiliation | Collaboration | Zeyuan Wang1,2 Keyan Ding2 Ming Qin1,2 Xiaotong Li1,2 Xiang Zhuang1,2 Yu Zhao4 Jianhua Yao4 Qiang Zhang3,2 Huajun Chen1,2 1College of Computer Science and Technology, Zhejiang University 2ZJU-Hangzhou Global Scientific and Technological Innovation Center 3The ZJU-UIUC Institute, International Campus, Zhejiang University 4Tencent AI Lab, Tencent |
| Pseudocode | Yes | Algorithm 1 Constructing the Space of Rank Variables |
| Open Source Code | Yes | We also provide supplementary material (including code and data) to ensure the reproducibility of experimental results. |
| Open Datasets | Yes | We conducted a thorough study across four benchmarks, including Protein Gym [44], β-Lactamase (Abbr., β-lact.) and Fluorescence (Abbr., Fluo.) from PEER [63], and GB1 (utilizing a 2-vs-rest split) from FLIP [8] For our datasets, we employed Protein Gym [44] [MIT License], PEER [63] [Apache-2.0 license], FLIP [8] [AFL-3.0 license]. |
| Dataset Splits | Yes | We implemented the Random cross-validation method recommended by [46]. In this approach, each mutation in the dataset is randomly assigned to one of five folds. The model s performance is then evaluated by averaging the results across these five folds. Given a testing dataset, we randomly select an additional 40 datasets from the same category for training. |
| Hardware Specification | Yes | All models are trained on four Nvidia V100 32G GPUs for up to 100 epochs by default. |
| Software Dependencies | No | We utilize Adam W as the optimizer. |
| Experiment Setup | Yes | We set the learning rate at 0.0001, with a weight decay of 0.005, utilizing Adam W as the optimizer. All models are trained on four Nvidia V100 32G GPUs for up to 100 epochs by default. |