Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization
Authors: Jiahao Qiu, Hui Yuan, Jinghong Zhang, Wentao Chen, Huazheng Wang, Mengdi Wang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test various instances of the algorithm across benchmark protein datasets using simulated screens. Experiment results demonstrate that the algorithm is both sample-efficient, diversity-promoting, and able to find top designs using reasonably small mutation counts. We experiment using three datasets from protein engineering studies and train oracles to simulate the ground-truth wet-lab fitness scores f of the landscape. |
| Researcher Affiliation | Collaboration | Jiahao Qiu*1, Hui Yuan*1, Jinghong Zhang*2, Wentao Chen3, Huazheng Wang4, Mengdi Wang1 1Princeton University 2University of California San Diego 3MLAB Biosciences Inc 4Oregon State University |
| Pseudocode | Yes | Algorithm 1: Meta algorithm |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the code for the methodology described in this paper. |
| Open Datasets | Yes | We build experiments around real-world protein datasets, such as AAV (Bryant et al. 2021), TEM (Gonzalez and Ostermeier 2019) and AAYL49 antibody (Engelhart et al. 2022). |
| Dataset Splits | No | The paper describes an iterative exploration process where models query sequences from a black-box oracle, rather than specifying traditional fixed training, validation, and test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper mentions 'abundant computing resources' from MLAB Biosciences Inc. but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions the use of 'neural network', 'UCB/TS formula', 'CNN', and 'TAPE embedding' but does not provide specific version numbers for any software libraries, frameworks, or dependencies. |
| Experiment Setup | Yes | In the experiment, we run each algorithm for 10 rounds with 100 query sequences per round for a fair comparison with our baselines. Each test is run for 50 repeats using 50 different random seeds. |