reproducibilityindex.ai

Designing Biological Sequences without Prior Knowledge Using Evolutionary Reinforcement Learning

Authors: Xi Zeng, Xiaotian Hao, Hongyao Tang, Zhentao Tang, Shaoqing Jiao, Dazhi Lu, Jiajie Peng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated the proposed method on three main types of biological sequence design tasks, including the design of DNA, RNA, and protein. The results demonstrate that the proposed method achieves significant improvement compared to the existing state-of-the-art methods. In this section, we show experimental results from a range of biologically relevant sequence design tasks to demonstrate the effectiveness of our proposed ERLBio Seq algorithm. Additionally, we conduct ablation studies to explore the contribution of each individual design component.
Researcher Affiliation	Collaboration	Xi Zeng1, Xiaotian Hao2, Hongyao Tang2, Zhentao Tang3, Shaoqing Jiao1, Dazhi Lu1, Jiajie Peng1, 4, 5* 1School of Computer Science, Northwestern Polytechnical University 2College of Intelligence and Computing, Tianjin University 3Noah s Ark Lab, Huawei 4Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology 5School of Computer Science, Research and Development Institute of Northwestern Polytechnical University in Shenzhen
Pseudocode	Yes	Algorithm 1: ERLBio Seq
Open Source Code	Yes	Implementation specifics of ERLBio Seq are provided in Appendix A.1. 1https://github.com/reset001/ERLBio Seqappendix.
Open Datasets	Yes	RNA Binding Task. This task aims to optimize RNA sequences to achieve the highest binding energy with nucleotide targets of lengths 14 and 50. The Vienna RNA package is utilized to compute the binding energy of RNA sequences (Lorenz et al. 2011). We follow the design task presented by Sinai et al. (2020). Protein Design Task. We evaluate the algorithms in the context of protein design tasks, employing Py Rosetta (Chaudhury, Lyskov, and Gray 2010) as the objective function... Adhering to the experimental configuration outlined in (Sinai et al. 2020), we optimize the structure of 3MSI, a 66-amino-acid antifreeze protein naturally occurring in oceanic environments (De Luca et al. 1998). TF Bind 8 Task. This task aims to find DNA sequences of length 8 that have high binding activity to human transcription factors. We use the same data as in Barrera et al. (2016) and follow the experimental setup of Trabucco et al. (2022).
Dataset Splits	No	The paper describes an iterative design process where data is collected and models are trained. It mentions 'train' in the context of training the fitness model and RL policy, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or counts) for the tasks.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments.
Software Dependencies	No	The paper mentions software like 'Vienna RNA package' and 'Py Rosetta' but does not provide specific version numbers for these or any other software dependencies, libraries, or programming languages used.
Experiment Setup	No	The paper states 'Further experimental details are provided in Appendix A.2.' which implies that specific experimental setup details, such as hyperparameters, are not provided in the main text.