reproducibilityindex.ai

Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders

Authors: Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, Andrew Gordon Wilson

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate La MBO on two small-molecule design tasks, and introduce new tasks optimizing in silico and in vitro properties of large-molecule fluorescent proteins. In our experiments La MBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that Bayes Opt is practical and effective for biological sequence design.
Researcher Affiliation	Collaboration	1Center for Data Science, New York University, New York, USA 2Courant Institute of Mathematical Sciences, New York University, New York, USA 3Big Hat Biosciences, San Mateo, CA, USA.
Pseudocode	Yes	Algorithm 1 The Bayes Opt outer loop
Open Source Code	Yes	Code here: github.com/samuelstanton/lambo.
Open Datasets	Yes	The original ZINC log P optimization task, popularized in the Bayes Opt community by G omez-Bombarelli et al. (2018). The SELFIES vocabulary was precomputed from the entire ZINC dataset (Krenn et al., 2020). We use the DRD3 docking score oracle from Huang et al. (2021). First we searched FPBase for all red-spectrum... proteins with known 3D structures.
Dataset Splits	Yes	We used weight decay (1e-4) and reserved 10% of all collected data (including online queries) as validation data for early stopping.
Hardware Specification	Yes	In fact, we used an Nvidia RTX 8000 GPU with 48 GB of memory just to produce Figure C.1.
Software Dependencies	No	The paper mentions software like "Py Torch (Paszke et al., 2019), Bo Torch (Balandat et al., 2020), and GPy Torch (Gardner et al., 2018)" but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Appendix B.4 provides a detailed table of "Hyperparameters" including "Sequence Optimization", "DAE Architecture", and "DAE Training" parameters such as Query batch size (b) 16, # Inner loop gradient steps (jmax) 32, Inner loop step size (η) 0.1, Entropy penalty (λ) 1e-2, DAE learning rate (MTGP head) 5e-3, and many others.