Knowledge-aware Reinforced Language Models for Protein Directed Evolution
Authors: Yuhao Wang, Qiang Zhang, Ming Qin, Xiang Zhuang, Xiaotong Li, Zhichen Gong, Zeyuan Wang, Yu Zhao, Jianhua Yao, Keyan Ding, Huajun Chen
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superior performance of Know RLM in more efficiently identifying high-fitness mutants compared to existing methods. Our study utilized two widely recognized public datasets, GB1 (Wu et al., 2019) and Pho Q (Podgornaia & Laub, 2015), to assess the effectiveness of the proposed Know RLM method. Table 1. Performance comparison across varying sample sizes on the GB1 dataset. Table 3. Results of ablation study on the AAKG |
| Researcher Affiliation | Collaboration | 1Zhejiang University 2ZJUHangzhou Global Scientific and Technological Innovation Center 3Tencent AI Lab. |
| Pseudocode | Yes | Algorithm 1 Know RLM for Directed Evolution |
| Open Source Code | Yes | Our code is available at https://github.com/HICAI-ZJU/Know RLM. |
| Open Datasets | Yes | Our study utilized two widely recognized public datasets, GB1 (Wu et al., 2019) and Pho Q (Podgornaia & Laub, 2015), to assess the effectiveness of the proposed Know RLM method. |
| Dataset Splits | Yes | In the preparation phase of the experiment, we employed a clustering method that is consistent with the one used in the CLADE approach(Qiu et al., 2021), resulting in the sampling of 96 mutants. These samples were subsequently annotated by the oracle, a process integral to evaluating the selected samples fitness. Following this, the 96 annotated samples served as the initial training data for the reward model, which was developed to furnish reward values for the reinforcement learning algorithm. |
| Hardware Specification | Yes | The code is run on a Ubuntu server equipped with a single GPU (NVIDIA TESLA V100 32G), ensuring high-performance computing capabilities essential for handling the complexity of the model and the property of the data. |
| Software Dependencies | No | Our model was developed and executed within the Py Torch framework, supplemented by the Stable-Baselines3 (Raffin et al., 2021) framework for reinforcement learning. |
| Experiment Setup | Yes | The discount factor in Eq.(12) is 0.99. The entropy coefficient is set at 0.2, alongside a clipping parameter of 0.2, crucial for stabilizing the policy gradient updates. Specific window parameters are detailed in Appendix C. |