Pareto Policy Pool for Model-based Offline Reinforcement Learning
Authors: Yijun Yang, Jing Jiang, Tianyi Zhou, Jie Ma, Yuhui Shi
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the D4RL benchmark for offline RL, P3 substantially outperforms several recent baseline methods over multiple tasks, especially when the quality of pre-collected experiences is low. ... This section aims to answer the following questions by evaluating P3 with other offline RL methods on the datasets from the D4RL Gym benchmark (Fu et al., 2020). |
| Researcher Affiliation | Academia | 1Australian Artificial Intelligence Institute, University of Technology Sydney 2University of Washington, Seattle, 3University of Maryland, College Park 4Department of Computer Science and Engineering, Southern University of Science and Technology |
| Pseudocode | Yes | Alg. 1 Pareto policy pool (P3) for model-based offline RL; Alg. 2 A two-stage method for solving constrained bi-objective optimization; Algorithm 3 Fitted Q evaluation (FQE) for Pareto policy selection |
| Open Source Code | Yes | Code is available at https://github.com/Over Euro/P3. |
| Open Datasets | Yes | We evaluate P3 and compare it with several state-of-the-art offline RL methods on the standard D4RL Gym benchmark (Fu et al., 2020). |
| Dataset Splits | Yes | We train an ensemble of N models and pick the best K models based on their prediction error on a hold-out set. ... D4RL Gym Datasets. D4RL is a widely-used benchmark for evaluating offline RL algorithms. It provides a variety of environments, tasks, and corresponding datasets... |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as CPU models, GPU models, or cloud computing instances. |
| Software Dependencies | No | The paper mentions software components like 'MLP', 'Adam', and 'Open AI s ES' but does not specify their version numbers or other library dependencies with versions needed for reproduction. |
| Experiment Setup | Yes | Table 3: Hyperparameters of environment model for D4RL Gym experiments. (e.g., Number of models/elites 7/5, Learning rate 10-4). Table 4: Hyperparameters of P3 for D4RL Gym experiments. (e.g., Policy network MLP(32, 32), Horizon length H 1000, Number of reference vectors n 5). |