reproducibilityindex.ai

Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization

Authors: Brandon Trabucco, Xinyang Geng, Aviral Kumar, Sergey Levine

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To address this, we present Design-Bench, a benchmark for offline MBO with a unified evaluation protocol and reference implementations of recent methods. Our benchmark includes a suite of diverse and realistic tasks derived from real-world optimization problems in biology, materials science, and robotics that present distinct challenges for offline MBO. Our benchmark and reference implementations are released at github.com/rail-berkeley/design-bench and github.com/rail-berkeley/design-baselines. and We systematically evaluate them on all of the proposed benchmark tasks and report results.
Researcher Affiliation	Academia	1Machine Learning Department, Carnegie Mellon University 2Department of Electrical Engineering and Computer Sciences, University of California Berkeley.
Pseudocode	No	No pseudocode or clearly labeled algorithm block is present in the paper.
Open Source Code	Yes	Our benchmark and reference implementations are released at github.com/rail-berkeley/design-bench and github.com/rail-berkeley/design-baselines.
Open Datasets	Yes	We adapt a real-world dataset proposed by Hamidieh (2018). and Ch EMBL (Gaulton et al., 2012). The TF Bind 8 and TF Bind 10 tasks are derivativesof the transcription factor binding activity survey performed by (Barrera et al., 2016)
Dataset Splits	Yes	For TF Bind 8, we sample 32898 of all the sequences, and for TF Bind 10 we sample 50000 sequences to form the training set. and The full dataset used to learn approximate oracles for evaluating MBO methods has 21263 samples, but we restrict this number to 17010 (the 80th percentile) for the training set of offline MBO methods to increase difficulty. and We sample 2440 total designs, and select the bottom performing 70% to be our training set. This gives us 1771 samples in total
Hardware Specification	Yes	We ran our experiments on a single server with 2 Intel Xeon E5-2698 v4 CPUs and 8 Nvidia Tesla V100 GPUs.
Software Dependencies	No	The paper mentions software like 'scikit-learn', 'Py Torch', 'Mu Jo Co', 'Open AI Gym', and 'stable baselines' but does not specify their version numbers.
Experiment Setup	Yes	In practice we perform T = 200 gradient steps and scaling the learning rate as α α/d where d is the dimension of the design space and For discrete tasks, only the objective values are normalized, and optimization is performed over log-probabilities of designs. and train it for 20 epochs using batch size 256 and The second tunable parameter of COMs is the number of gradient ascent steps to perform when optimizing x, and is uniformly chosen to be 50. The final parameter is the learning rate used when optimizing x, which is uniformly chosen to be 2/d for all discrete tasks and 0.05/d for all continuous tasks, where d is the cardinality of the design space.