Model-based reinforcement learning for biological sequence design
Authors: Christof Angermueller, David Dohan, David Belanger, Ramya Deshpande, Kevin Murphy, Lucy Colwell
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We rigorously evaluate our method on three in-silico sequence design tasks that draw on experimental data to construct functions f(x) characteristic of real-world design problems: optimizing binding affinity of DNA sequences of length 8 (search space size 48); optimizing anti-microbial peptide sequences (search space size 2050), and optimizing binary sequences where f(x) is defined by the energy of an Ising model for protein structure (search space size 2050). These do not rely on wet lab experiments, and thus allow for large-scale benchmarking across a range of methods. We show that our Dy NA PPO method achieves higher cumulative reward for a given budget (measured in terms of number of calls to f(x)) than existing methods, such as standard PPO, various forms of the cross-entropy method, Bayesian optimization, and evolutionary search. |
| Researcher Affiliation | Collaboration | Christof Angermueller Google Research {christofa}@google.com David Dohan Google Research {ddohan}@google.com David Belanger Google Research {dbelanger}@google.com Ramya Deshpande Caltech {rdeshpan}@caltech.edu Kevin Murphy Google Research {kpmurphy}@google.com Lucy Colwell Google Research University of Cambridge {lcolwell}@google.com |
| Pseudocode | Yes | Algorithm 1: Dy NA PPO |
| Open Source Code | No | The paper does not provide an explicit link to its own source code for the Dy NA PPO methodology. It only mentions using the TF-Agents RL library and Scikit-learn, which are external libraries. |
| Open Datasets | Yes | We used the dataset described by Barrera et al. (2016) [..] We downloaded the dataset1 provided by Witten & Witten (2019), which contains 6,760 unique AMP sequences and their antimicrobial activity towards multiple pathogens. [..] 1https://github.com/zswitten/Antimicrobial-Peptides |
| Dataset Splits | Yes | We use one task (CRX REF R1) for optimizing the hyper-parameters of all methods, and test performance on 41 heterogeneous hold-out tasks. [..] We quantify the accuracy of each candidate model by the R2 score, which we estimate by five-fold cross-validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory specifications). |
| Software Dependencies | No | We implement algorithms using the TF-Agents RL library (Guadarrama et al., 2018). [..] We considered the following candidate models (implemented in Scikit-learn (Pedregosa et al., 2011)). The paper mentions software libraries but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | On the left of Figure 2 we vary the number of inner-loop policy optimization rounds with observations from the model-based environment, where using 0 rounds corresponds to performing standard PPO. [..] We also tuned the maximum number of model-based optimization rounds M (see Section 2.3). [..] As hyper-parameters, we tuned the learning rate, number of training steps, adaptive KL target, and entropy regularization. [..] For Dy NA PPO, we also tuned the maximum number of model-based optimization rounds M (see Section 2.3). |