reproducibilityindex.ai

DePLM: Denoising Protein Language Models for Property Optimization

Authors: Zeyuan Wang, Keyan Ding, Ming Qin, Xiaotong Li, Xiang Zhuang, Yu Zhao, Jianhua Yao, Qiang Zhang, Huajun Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we extensively evaluate De PLM across various datasets and demonstrate its superior performance and robust generalization capabilities.
Researcher Affiliation	Collaboration	Zeyuan Wang1,2 Keyan Ding2 Ming Qin1,2 Xiaotong Li1,2 Xiang Zhuang1,2 Yu Zhao4 Jianhua Yao4 Qiang Zhang3,2 Huajun Chen1,2 1College of Computer Science and Technology, Zhejiang University 2ZJU-Hangzhou Global Scientific and Technological Innovation Center 3The ZJU-UIUC Institute, International Campus, Zhejiang University 4Tencent AI Lab, Tencent
Pseudocode	Yes	Algorithm 1 Constructing the Space of Rank Variables
Open Source Code	Yes	We also provide supplementary material (including code and data) to ensure the reproducibility of experimental results.
Open Datasets	Yes	We conducted a thorough study across four benchmarks, including Protein Gym [44], β-Lactamase (Abbr., β-lact.) and Fluorescence (Abbr., Fluo.) from PEER [63], and GB1 (utilizing a 2-vs-rest split) from FLIP [8] For our datasets, we employed Protein Gym [44] [MIT License], PEER [63] [Apache-2.0 license], FLIP [8] [AFL-3.0 license].
Dataset Splits	Yes	We implemented the Random cross-validation method recommended by [46]. In this approach, each mutation in the dataset is randomly assigned to one of five folds. The model s performance is then evaluated by averaging the results across these five folds. Given a testing dataset, we randomly select an additional 40 datasets from the same category for training.
Hardware Specification	Yes	All models are trained on four Nvidia V100 32G GPUs for up to 100 epochs by default.
Software Dependencies	No	We utilize Adam W as the optimizer.
Experiment Setup	Yes	We set the learning rate at 0.0001, with a weight decay of 0.005, utilizing Adam W as the optimizer. All models are trained on four Nvidia V100 32G GPUs for up to 100 epochs by default.