reproducibilityindex.ai

Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

Authors: Jiahao Qiu, Hui Yuan, Jinghong Zhang, Wentao Chen, Huazheng Wang, Mengdi Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test various instances of the algorithm across benchmark protein datasets using simulated screens. Experiment results demonstrate that the algorithm is both sample-efficient, diversity-promoting, and able to find top designs using reasonably small mutation counts. We experiment using three datasets from protein engineering studies and train oracles to simulate the ground-truth wet-lab fitness scores f of the landscape.
Researcher Affiliation	Collaboration	Jiahao Qiu1, Hui Yuan1, Jinghong Zhang*2, Wentao Chen3, Huazheng Wang4, Mengdi Wang1 1Princeton University 2University of California San Diego 3MLAB Biosciences Inc 4Oregon State University
Pseudocode	Yes	Algorithm 1: Meta algorithm
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of the code for the methodology described in this paper.
Open Datasets	Yes	We build experiments around real-world protein datasets, such as AAV (Bryant et al. 2021), TEM (Gonzalez and Ostermeier 2019) and AAYL49 antibody (Engelhart et al. 2022).
Dataset Splits	No	The paper describes an iterative exploration process where models query sequences from a black-box oracle, rather than specifying traditional fixed training, validation, and test dataset splits with percentages or counts.
Hardware Specification	No	The paper mentions 'abundant computing resources' from MLAB Biosciences Inc. but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions the use of 'neural network', 'UCB/TS formula', 'CNN', and 'TAPE embedding' but does not provide specific version numbers for any software libraries, frameworks, or dependencies.
Experiment Setup	Yes	In the experiment, we run each algorithm for 10 rounds with 100 query sequences per round for a fair comparison with our baselines. Each test is run for 50 repeats using 50 different random seeds.