reproducibilityindex.ai

Model-based reinforcement learning for biological sequence design

Authors: Christof Angermueller, David Dohan, David Belanger, Ramya Deshpande, Kevin Murphy, Lucy Colwell

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We rigorously evaluate our method on three in-silico sequence design tasks that draw on experimental data to construct functions f(x) characteristic of real-world design problems: optimizing binding afﬁnity of DNA sequences of length 8 (search space size 48); optimizing anti-microbial peptide sequences (search space size 2050), and optimizing binary sequences where f(x) is deﬁned by the energy of an Ising model for protein structure (search space size 2050). These do not rely on wet lab experiments, and thus allow for large-scale benchmarking across a range of methods. We show that our Dy NA PPO method achieves higher cumulative reward for a given budget (measured in terms of number of calls to f(x)) than existing methods, such as standard PPO, various forms of the cross-entropy method, Bayesian optimization, and evolutionary search.
Researcher Affiliation	Collaboration	Christof Angermueller Google Research {christofa}@google.com David Dohan Google Research {ddohan}@google.com David Belanger Google Research {dbelanger}@google.com Ramya Deshpande Caltech {rdeshpan}@caltech.edu Kevin Murphy Google Research {kpmurphy}@google.com Lucy Colwell Google Research University of Cambridge {lcolwell}@google.com
Pseudocode	Yes	Algorithm 1: Dy NA PPO
Open Source Code	No	The paper does not provide an explicit link to its own source code for the Dy NA PPO methodology. It only mentions using the TF-Agents RL library and Scikit-learn, which are external libraries.
Open Datasets	Yes	We used the dataset described by Barrera et al. (2016) [..] We downloaded the dataset1 provided by Witten & Witten (2019), which contains 6,760 unique AMP sequences and their antimicrobial activity towards multiple pathogens. [..] 1https://github.com/zswitten/Antimicrobial-Peptides
Dataset Splits	Yes	We use one task (CRX REF R1) for optimizing the hyper-parameters of all methods, and test performance on 41 heterogeneous hold-out tasks. [..] We quantify the accuracy of each candidate model by the R2 score, which we estimate by ﬁve-fold cross-validation.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory specifications).
Software Dependencies	No	We implement algorithms using the TF-Agents RL library (Guadarrama et al., 2018). [..] We considered the following candidate models (implemented in Scikit-learn (Pedregosa et al., 2011)). The paper mentions software libraries but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	On the left of Figure 2 we vary the number of inner-loop policy optimization rounds with observations from the model-based environment, where using 0 rounds corresponds to performing standard PPO. [..] We also tuned the maximum number of model-based optimization rounds M (see Section 2.3). [..] As hyper-parameters, we tuned the learning rate, number of training steps, adaptive KL target, and entropy regularization. [..] For Dy NA PPO, we also tuned the maximum number of model-based optimization rounds M (see Section 2.3).