Data-Driven Offline Decision-Making via Invariant Representation Learning

Authors: Han Qi, Yi Su, Aviral Kumar, Sergey Levine

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we evaluate IOM on several tasks from Design-Bench [43] and find that it outperforms the best prior methods, and additionally, admits appealing offline tuning strategies unlike the prior methods. We evaluate on four tasks with continuous-valued input space from the Design-Bench [43] benchmark for offline model-based optimization (MBO). In Figure 2 (left), we show the median and interquantile mean (IQM) [1] for the aggregated scores across all tasks.
Researcher Affiliation Collaboration Han Qi , Yi Su , Aviral Kumar , Sergey Levine Department of Electrical Engineering and Computer Sciences, UC Berkeley {han2019, aviralk}@berkeley.edu, yisumtv@google.com
Pseudocode Yes Pseudocode for IOM is shown in Algorithm 1. (See Algorithm 1 on page 4)
Open Source Code No The paper thanks authors for help in setting up Design-Bench and COMs codebases, but does not state that the code for the method described in this paper (IOM) is open source or provide a link to it.
Open Datasets Yes We evaluate on four tasks with continuous-valued input space from the Design-Bench [43] benchmark for offline model-based optimization (MBO). [43] Brandon Trabucco, Xinyang Geng, Aviral Kumar, and Sergey Levine. Design-bench: Benchmarks for data-driven offline model-based optimization, 2021. URL https://github.com/ brandontrabucco/design-bench.
Dataset Splits No Then for each run, we record the validation in-distribution error and the value of the invariance regularizer on a validation set. Second, we now pick models that attain good performance within the training distribution by selecting λ values that attain the smallest validation prediction error: (f (φ(x)) y)2, in addition to picking the early stopping point based on the smallest validation error. While a validation set is mentioned and used for tuning, the paper does not specify the explicit split percentages or sample counts for how the validation data was created from the full dataset.
Hardware Specification No The paper mentions "compute resources from Google cloud" but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies Yes All experiments were run with Pytorch 1.8.1 and Python 3.8.5.
Experiment Setup Yes We model the representation φ(x) and the learned function f ( ) each as two-hidden layer Re LU networks with sizes 2048 and 1024, respectively. Input: training data D, number of gradient steps T = 50 to optimize µOPT starting from the training distribution µ, training iteration K, batch size n. λ denotes a weighting hyperparameter.