DePLM: Denoising Protein Language Models for Property Optimization

Authors: Zeyuan Wang, Keyan Ding, Ming Qin, Xiaotong Li, Xiang Zhuang, Yu Zhao, Jianhua Yao, Qiang Zhang, Huajun Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we extensively evaluate De PLM across various datasets and demonstrate its superior performance and robust generalization capabilities.
Researcher Affiliation Collaboration Zeyuan Wang1,2 Keyan Ding2 Ming Qin1,2 Xiaotong Li1,2 Xiang Zhuang1,2 Yu Zhao4 Jianhua Yao4 Qiang Zhang3,2 Huajun Chen1,2 1College of Computer Science and Technology, Zhejiang University 2ZJU-Hangzhou Global Scientific and Technological Innovation Center 3The ZJU-UIUC Institute, International Campus, Zhejiang University 4Tencent AI Lab, Tencent
Pseudocode Yes Algorithm 1 Constructing the Space of Rank Variables
Open Source Code Yes We also provide supplementary material (including code and data) to ensure the reproducibility of experimental results.
Open Datasets Yes We conducted a thorough study across four benchmarks, including Protein Gym [44], β-Lactamase (Abbr., β-lact.) and Fluorescence (Abbr., Fluo.) from PEER [63], and GB1 (utilizing a 2-vs-rest split) from FLIP [8] For our datasets, we employed Protein Gym [44] [MIT License], PEER [63] [Apache-2.0 license], FLIP [8] [AFL-3.0 license].
Dataset Splits Yes We implemented the Random cross-validation method recommended by [46]. In this approach, each mutation in the dataset is randomly assigned to one of five folds. The model s performance is then evaluated by averaging the results across these five folds. Given a testing dataset, we randomly select an additional 40 datasets from the same category for training.
Hardware Specification Yes All models are trained on four Nvidia V100 32G GPUs for up to 100 epochs by default.
Software Dependencies No We utilize Adam W as the optimizer.
Experiment Setup Yes We set the learning rate at 0.0001, with a weight decay of 0.005, utilizing Adam W as the optimizer. All models are trained on four Nvidia V100 32G GPUs for up to 100 epochs by default.