Offline Model-Based Optimization via Policy-Guided Gradient Search
Authors: Yassine Chemingui, Aryan Deshwal, Trong Nghia Hoang, Janardhan Rao Doppa
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results on multiple benchmarks demonstrate that the learned optimization policy can be combined with existing offline surrogates to significantly improve the optimization performance. |
| Researcher Affiliation | Academia | School of EECS, Washington State University {yassine.chemingui, aryan.deshwal, trongnghia.hoang, jana.doppa}@wsu.edu |
| Pseudocode | Yes | Algorithm 1: Policy-guided Gradient Search (PGS) ... Algorithm 2: Learning to Guide Gradient Search |
| Open Source Code | Yes | The code for PGS is publicly available at https://github.com/yassine Ch/PGS. |
| Open Datasets | Yes | We employ six challenging benchmark tasks (and corresponding datasets) from diverse domains. All these datasets and the oracle evaluations are accessed via the design-bench benchmark (Trabucco et al. 2022). |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning for training, validation, and testing. It mentions a benchmark dataset but not the splits used within their experiments beyond training trajectories for RL. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using a 'vanilla multilayer perceptron', 'CQL', and 'VPNs' and provides links to their implementations, but does not specify version numbers for these or other software libraries/dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For each task, we normalize inputs and outputs before we train a vanilla multilayer perceptron, ˆfθ, with two hidden layers with 2048 units and Re LU activation. ˆfθ is trained to minimize the mean squared error of function values. ... We evaluate four different values of p = {10, 20, 30, 40} for Top p percentile data and number of epochs of CQL ranging from 50 to 400 in increments of 50 and picked the configuration with the best OSEL performance. |