Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization

Authors: Brandon Trabucco, Xinyang Geng, Aviral Kumar, Sergey Levine

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To address this, we present Design-Bench, a benchmark for offline MBO with a unified evaluation protocol and reference implementations of recent methods. Our benchmark includes a suite of diverse and realistic tasks derived from real-world optimization problems in biology, materials science, and robotics that present distinct challenges for offline MBO. Our benchmark and reference implementations are released at github.com/rail-berkeley/design-bench and github.com/rail-berkeley/design-baselines. and We systematically evaluate them on all of the proposed benchmark tasks and report results.
Researcher Affiliation Academia 1Machine Learning Department, Carnegie Mellon University 2Department of Electrical Engineering and Computer Sciences, University of California Berkeley.
Pseudocode No No pseudocode or clearly labeled algorithm block is present in the paper.
Open Source Code Yes Our benchmark and reference implementations are released at github.com/rail-berkeley/design-bench and github.com/rail-berkeley/design-baselines.
Open Datasets Yes We adapt a real-world dataset proposed by Hamidieh (2018). and Ch EMBL (Gaulton et al., 2012). The TF Bind 8 and TF Bind 10 tasks are derivativesof the transcription factor binding activity survey performed by (Barrera et al., 2016)
Dataset Splits Yes For TF Bind 8, we sample 32898 of all the sequences, and for TF Bind 10 we sample 50000 sequences to form the training set. and The full dataset used to learn approximate oracles for evaluating MBO methods has 21263 samples, but we restrict this number to 17010 (the 80th percentile) for the training set of offline MBO methods to increase difficulty. and We sample 2440 total designs, and select the bottom performing 70% to be our training set. This gives us 1771 samples in total
Hardware Specification Yes We ran our experiments on a single server with 2 Intel Xeon E5-2698 v4 CPUs and 8 Nvidia Tesla V100 GPUs.
Software Dependencies No The paper mentions software like 'scikit-learn', 'Py Torch', 'Mu Jo Co', 'Open AI Gym', and 'stable baselines' but does not specify their version numbers.
Experiment Setup Yes In practice we perform T = 200 gradient steps and scaling the learning rate as α α/d where d is the dimension of the design space and For discrete tasks, only the objective values are normalized, and optimization is performed over log-probabilities of designs. and train it for 20 epochs using batch size 256 and The second tunable parameter of COMs is the number of gradient ascent steps to perform when optimizing x, and is uniformly chosen to be 50. The final parameter is the learning rate used when optimizing x, which is uniformly chosen to be 2/d for all discrete tasks and 0.05/d for all continuous tasks, where d is the cardinality of the design space.