Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

Authors: Jiahao Qiu, Hui Yuan, Jinghong Zhang, Wentao Chen, Huazheng Wang, Mengdi Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test various instances of the algorithm across benchmark protein datasets using simulated screens. Experiment results demonstrate that the algorithm is both sample-efficient, diversity-promoting, and able to find top designs using reasonably small mutation counts. We experiment using three datasets from protein engineering studies and train oracles to simulate the ground-truth wet-lab fitness scores f of the landscape.
Researcher Affiliation Collaboration Jiahao Qiu*1, Hui Yuan*1, Jinghong Zhang*2, Wentao Chen3, Huazheng Wang4, Mengdi Wang1 1Princeton University 2University of California San Diego 3MLAB Biosciences Inc 4Oregon State University
Pseudocode Yes Algorithm 1: Meta algorithm
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of the code for the methodology described in this paper.
Open Datasets Yes We build experiments around real-world protein datasets, such as AAV (Bryant et al. 2021), TEM (Gonzalez and Ostermeier 2019) and AAYL49 antibody (Engelhart et al. 2022).
Dataset Splits No The paper describes an iterative exploration process where models query sequences from a black-box oracle, rather than specifying traditional fixed training, validation, and test dataset splits with percentages or counts.
Hardware Specification No The paper mentions 'abundant computing resources' from MLAB Biosciences Inc. but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies No The paper mentions the use of 'neural network', 'UCB/TS formula', 'CNN', and 'TAPE embedding' but does not provide specific version numbers for any software libraries, frameworks, or dependencies.
Experiment Setup Yes In the experiment, we run each algorithm for 10 rounds with 100 query sequences per round for a fair comparison with our baselines. Each test is run for 50 repeats using 50 different random seeds.