Designing Biological Sequences without Prior Knowledge Using Evolutionary Reinforcement Learning
Authors: Xi Zeng, Xiaotian Hao, Hongyao Tang, Zhentao Tang, Shaoqing Jiao, Dazhi Lu, Jiajie Peng
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the proposed method on three main types of biological sequence design tasks, including the design of DNA, RNA, and protein. The results demonstrate that the proposed method achieves significant improvement compared to the existing state-of-the-art methods. In this section, we show experimental results from a range of biologically relevant sequence design tasks to demonstrate the effectiveness of our proposed ERLBio Seq algorithm. Additionally, we conduct ablation studies to explore the contribution of each individual design component. |
| Researcher Affiliation | Collaboration | Xi Zeng1, Xiaotian Hao2, Hongyao Tang2, Zhentao Tang3, Shaoqing Jiao1, Dazhi Lu1, Jiajie Peng1, 4, 5* 1School of Computer Science, Northwestern Polytechnical University 2College of Intelligence and Computing, Tianjin University 3Noah s Ark Lab, Huawei 4Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology 5School of Computer Science, Research and Development Institute of Northwestern Polytechnical University in Shenzhen |
| Pseudocode | Yes | Algorithm 1: ERLBio Seq |
| Open Source Code | Yes | Implementation specifics of ERLBio Seq are provided in Appendix A.1. 1https://github.com/reset001/ERLBio Seqappendix. |
| Open Datasets | Yes | RNA Binding Task. This task aims to optimize RNA sequences to achieve the highest binding energy with nucleotide targets of lengths 14 and 50. The Vienna RNA package is utilized to compute the binding energy of RNA sequences (Lorenz et al. 2011). We follow the design task presented by Sinai et al. (2020). Protein Design Task. We evaluate the algorithms in the context of protein design tasks, employing Py Rosetta (Chaudhury, Lyskov, and Gray 2010) as the objective function... Adhering to the experimental configuration outlined in (Sinai et al. 2020), we optimize the structure of 3MSI, a 66-amino-acid antifreeze protein naturally occurring in oceanic environments (De Luca et al. 1998). TF Bind 8 Task. This task aims to find DNA sequences of length 8 that have high binding activity to human transcription factors. We use the same data as in Barrera et al. (2016) and follow the experimental setup of Trabucco et al. (2022). |
| Dataset Splits | No | The paper describes an iterative design process where data is collected and models are trained. It mentions 'train' in the context of training the fitness model and RL policy, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or counts) for the tasks. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments. |
| Software Dependencies | No | The paper mentions software like 'Vienna RNA package' and 'Py Rosetta' but does not provide specific version numbers for these or any other software dependencies, libraries, or programming languages used. |
| Experiment Setup | No | The paper states 'Further experimental details are provided in Appendix A.2.' which implies that specific experimental setup details, such as hyperparameters, are not provided in the main text. |