Offline Model-Based Optimization via Policy-Guided Gradient Search

Authors: Yassine Chemingui, Aryan Deshwal, Trong Nghia Hoang, Janardhan Rao Doppa

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results on multiple benchmarks demonstrate that the learned optimization policy can be combined with existing offline surrogates to significantly improve the optimization performance.
Researcher Affiliation Academia School of EECS, Washington State University {yassine.chemingui, aryan.deshwal, trongnghia.hoang, jana.doppa}@wsu.edu
Pseudocode Yes Algorithm 1: Policy-guided Gradient Search (PGS) ... Algorithm 2: Learning to Guide Gradient Search
Open Source Code Yes The code for PGS is publicly available at https://github.com/yassine Ch/PGS.
Open Datasets Yes We employ six challenging benchmark tasks (and corresponding datasets) from diverse domains. All these datasets and the oracle evaluations are accessed via the design-bench benchmark (Trabucco et al. 2022).
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning for training, validation, and testing. It mentions a benchmark dataset but not the splits used within their experiments beyond training trajectories for RL.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using a 'vanilla multilayer perceptron', 'CQL', and 'VPNs' and provides links to their implementations, but does not specify version numbers for these or other software libraries/dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For each task, we normalize inputs and outputs before we train a vanilla multilayer perceptron, ˆfθ, with two hidden layers with 2048 units and Re LU activation. ˆfθ is trained to minimize the mean squared error of function values. ... We evaluate four different values of p = {10, 20, 30, 40} for Top p percentile data and number of epochs of CQL ranging from 50 to 400 in increments of 50 and picked the configuration with the best OSEL performance.