Knowledge-aware Reinforced Language Models for Protein Directed Evolution

Authors: Yuhao Wang, Qiang Zhang, Ming Qin, Xiang Zhuang, Xiaotong Li, Zhichen Gong, Zeyuan Wang, Yu Zhao, Jianhua Yao, Keyan Ding, Huajun Chen

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the superior performance of Know RLM in more efficiently identifying high-fitness mutants compared to existing methods. Our study utilized two widely recognized public datasets, GB1 (Wu et al., 2019) and Pho Q (Podgornaia & Laub, 2015), to assess the effectiveness of the proposed Know RLM method. Table 1. Performance comparison across varying sample sizes on the GB1 dataset. Table 3. Results of ablation study on the AAKG
Researcher Affiliation Collaboration 1Zhejiang University 2ZJUHangzhou Global Scientific and Technological Innovation Center 3Tencent AI Lab.
Pseudocode Yes Algorithm 1 Know RLM for Directed Evolution
Open Source Code Yes Our code is available at https://github.com/HICAI-ZJU/Know RLM.
Open Datasets Yes Our study utilized two widely recognized public datasets, GB1 (Wu et al., 2019) and Pho Q (Podgornaia & Laub, 2015), to assess the effectiveness of the proposed Know RLM method.
Dataset Splits Yes In the preparation phase of the experiment, we employed a clustering method that is consistent with the one used in the CLADE approach(Qiu et al., 2021), resulting in the sampling of 96 mutants. These samples were subsequently annotated by the oracle, a process integral to evaluating the selected samples fitness. Following this, the 96 annotated samples served as the initial training data for the reward model, which was developed to furnish reward values for the reinforcement learning algorithm.
Hardware Specification Yes The code is run on a Ubuntu server equipped with a single GPU (NVIDIA TESLA V100 32G), ensuring high-performance computing capabilities essential for handling the complexity of the model and the property of the data.
Software Dependencies No Our model was developed and executed within the Py Torch framework, supplemented by the Stable-Baselines3 (Raffin et al., 2021) framework for reinforcement learning.
Experiment Setup Yes The discount factor in Eq.(12) is 0.99. The entropy coefficient is set at 0.2, alongside a clipping parameter of 0.2, crucial for stabilizing the policy gradient updates. Specific window parameters are detailed in Appendix C.