Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders
Authors: Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, Andrew Gordon Wilson
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate La MBO on two small-molecule design tasks, and introduce new tasks optimizing in silico and in vitro properties of large-molecule fluorescent proteins. In our experiments La MBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that Bayes Opt is practical and effective for biological sequence design. |
| Researcher Affiliation | Collaboration | 1Center for Data Science, New York University, New York, USA 2Courant Institute of Mathematical Sciences, New York University, New York, USA 3Big Hat Biosciences, San Mateo, CA, USA. |
| Pseudocode | Yes | Algorithm 1 The Bayes Opt outer loop |
| Open Source Code | Yes | Code here: github.com/samuelstanton/lambo. |
| Open Datasets | Yes | The original ZINC log P optimization task, popularized in the Bayes Opt community by G omez-Bombarelli et al. (2018). The SELFIES vocabulary was precomputed from the entire ZINC dataset (Krenn et al., 2020). We use the DRD3 docking score oracle from Huang et al. (2021). First we searched FPBase for all red-spectrum... proteins with known 3D structures. |
| Dataset Splits | Yes | We used weight decay (1e-4) and reserved 10% of all collected data (including online queries) as validation data for early stopping. |
| Hardware Specification | Yes | In fact, we used an Nvidia RTX 8000 GPU with 48 GB of memory just to produce Figure C.1. |
| Software Dependencies | No | The paper mentions software like "Py Torch (Paszke et al., 2019), Bo Torch (Balandat et al., 2020), and GPy Torch (Gardner et al., 2018)" but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Appendix B.4 provides a detailed table of "Hyperparameters" including "Sequence Optimization", "DAE Architecture", and "DAE Training" parameters such as Query batch size (b) 16, # Inner loop gradient steps (jmax) 32, Inner loop step size (η) 0.1, Entropy penalty (λ) 1e-2, DAE learning rate (MTGP head) 5e-3, and many others. |