Conservative Objective Models for Effective Offline Model-Based Optimization

Authors: Brandon Trabucco, Aviral Kumar, Xinyang Geng, Sergey Levine

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In practice, COMs outperform a number existing methods on a wide range of MBO problems, including optimizing controller parameters, robot morphologies, and superconducting materials. (from Abstract) and 6. Experimental Evaluation To evaluate the efficacy of COMs for offline model-based optimization, we first perform a comparative evaluation of COMs on four continuous offline MBO tasks based on problems in physical sciences, neural network design, and robotics, proposed in the design-bench benchmark (Trabucco et al., 2021). In addition, we perform an empirical analysis on COMs that aims to answer the following questions: (1) Is conservative training essential for improved performance and stability of COMs? How do COMs compare to a na ıve objective model in terms of stability?, (2) How does the trust-region optimizer improve the stability of optimizing COMs?, (3) Are COMs robust to hyperparameter choices and consistent to evaluation conditions.
Researcher Affiliation Academia Brandon Trabucco 1 Aviral Kumar 1 Xinyang Geng 1 Sergey Levine 1 Department of Electrical Engineering and Computer Sciences, University of California Berkeley.. Correspondence to: Brandon Trabucco <btrabucco@berkeley.edu>, Aviral Kumar <aviralk@berkeley.edu>.
Pseudocode Yes Algorithm 1 COM: Training Conservative Models and Algorithm 2 COM: Finding x (both on page 4).
Open Source Code Yes Code for reproducing our experimental results is available at https://github.com/ brandontrabucco/design-baselines. (from Section 6).
Open Datasets Yes To evaluate the efficacy of COMs for offline model-based optimization, we first perform a comparative evaluation of COMs on four continuous offline MBO tasks based on problems in physical sciences, neural network design, and robotics, proposed in the design-bench benchmark (Trabucco et al., 2021). (from Section 6).
Dataset Splits No The paper describes using static datasets from the `design-bench` benchmark but does not provide explicit details on training, validation, or test set splits, such as percentages or sample counts, within the paper itself.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper mentions using 'Adam optimizer (Kingma & Ba, 2015)' but does not specify any software versions for libraries like PyTorch, TensorFlow, or Python itself.
Experiment Setup Yes Briefly, for all of our experiments, the conservative objective model ˆfθ is modeled as a neural network with two hidden layers of size 2048 each and leaky Re LU activations... In order to train this conservative objective model, we use the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 10 3... During optimization, we utilized the trust-region gradient-ascent optimizer with β = 0.9... Finally, in order to choose the time step T in Equation 4 that is supposed to provide us with the final solution x = x T , we pick a large and universal time step of T = 450.