reproducibilityindex.ai

Offline Model-Based Optimization via Policy-Guided Gradient Search

Authors: Yassine Chemingui, Aryan Deshwal, Trong Nghia Hoang, Janardhan Rao Doppa

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results on multiple benchmarks demonstrate that the learned optimization policy can be combined with existing offline surrogates to significantly improve the optimization performance.
Researcher Affiliation	Academia	School of EECS, Washington State University {yassine.chemingui, aryan.deshwal, trongnghia.hoang, jana.doppa}@wsu.edu
Pseudocode	Yes	Algorithm 1: Policy-guided Gradient Search (PGS) ... Algorithm 2: Learning to Guide Gradient Search
Open Source Code	Yes	The code for PGS is publicly available at https://github.com/yassine Ch/PGS.
Open Datasets	Yes	We employ six challenging benchmark tasks (and corresponding datasets) from diverse domains. All these datasets and the oracle evaluations are accessed via the design-bench benchmark (Trabucco et al. 2022).
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning for training, validation, and testing. It mentions a benchmark dataset but not the splits used within their experiments beyond training trajectories for RL.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using a 'vanilla multilayer perceptron', 'CQL', and 'VPNs' and provides links to their implementations, but does not specify version numbers for these or other software libraries/dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	For each task, we normalize inputs and outputs before we train a vanilla multilayer perceptron, ˆfθ, with two hidden layers with 2048 units and Re LU activation. ˆfθ is trained to minimize the mean squared error of function values. ... We evaluate four different values of p = {10, 20, 30, 40} for Top p percentile data and number of epochs of CQL ranging from 50 to 400 in increments of 50 and picked the configuration with the best OSEL performance.