Robust Model-Based Optimization for Challenging Fitness Landscapes

Authors: Saba Ghaffari, Ehsan Saleh, Alex Schwing, Yu-Xiong Wang, Martin D. Burke, Saurabh Sinha

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our comprehensive benchmark on real and semi-synthetic protein datasets as well as solution design for physics-informed neural networks, showcases the generality of our approach in discrete and continuous design spaces.
Researcher Affiliation Academia 1University of Illinois Urbana-Champaign, 2Georgia Institute of Technology {sabag2, ehsans2, aschwing, yxw, mdburke}@illinois.edu, saurabh.sinha@bme.gatech.edu
Pseudocode No The paper describes the proposed method using mathematical equations and textual explanations, but it does not include a dedicated pseudocode or algorithm block.
Open Source Code Yes Our implementation is available at https://github.com/sabagh1994/PGVAE.
Open Datasets Yes Our comprehensive benchmark on real and semi-synthetic protein datasets as well as solution design for physics-informed neural networks, showcases the generality of our approach in discrete and continuous design spaces. ... The dataset contains synthetic property values that are monotonically decreasing with the digit class. ... We chose the popular AAV (Adeno-associated virus) dataset (Bryant et al., 2021) ... The dataset was obtained from (Dallago et al., 2021). ... two popular protein datasets GB1 (Wu et al., 2016) and Pho Q (Podgornaia & Laub, 2015).
Dataset Splits No The paper describes how "train sets were generated" for various datasets and how imbalance and separation were varied, but it does not specify explicit training/validation/test dataset splits in percentages or by sample count for reproducibility of the partitioning of the original data. The evaluation metric Ymax focuses on the relative improvement of found properties, implying an iterative search process rather than a standard validation set.
Hardware Specification No The paper mentions that the work 'utilized resources supported by 1) the National Science Foundation s Major Research Instrumentation program, grant No. 1725729 (Kindratenko et al., 2020), and 2) the Delta advanced computing and data resource which is supported by the National Science Foundation (award OAC 2005572) and the State of Illinois.' However, these descriptions do not include specific hardware details such as GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions using specific VAE architectures (Table A2) and relying on implementations by Brookes et al. (2019) for baseline methods. However, it does not provide specific software dependencies with version numbers, such as programming languages or deep learning frameworks (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup Yes We performed 10 rounds of MBO on the GMM benchmark and 20 rounds of MBO on the rest of the benchmark datasets. In all experiments temperature (τ) was set to five for PPGVAE with no further tuning. We used the implementation and hyper-parameters provided by (Brookes et al., 2019), for Cb AS, Bombarelli, RWR, and CEM-PI methods. The architecture of VAE was the same for all methods (Table A2).