reproducibilityindex.ai

Pareto Policy Pool for Model-based Offline Reinforcement Learning

Authors: Yijun Yang, Jing Jiang, Tianyi Zhou, Jie Ma, Yuhui Shi

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the D4RL benchmark for ofﬂine RL, P3 substantially outperforms several recent baseline methods over multiple tasks, especially when the quality of pre-collected experiences is low. ... This section aims to answer the following questions by evaluating P3 with other ofﬂine RL methods on the datasets from the D4RL Gym benchmark (Fu et al., 2020).
Researcher Affiliation	Academia	1Australian Artiﬁcial Intelligence Institute, University of Technology Sydney 2University of Washington, Seattle, 3University of Maryland, College Park 4Department of Computer Science and Engineering, Southern University of Science and Technology
Pseudocode	Yes	Alg. 1 Pareto policy pool (P3) for model-based ofﬂine RL; Alg. 2 A two-stage method for solving constrained bi-objective optimization; Algorithm 3 Fitted Q evaluation (FQE) for Pareto policy selection
Open Source Code	Yes	Code is available at https://github.com/Over Euro/P3.
Open Datasets	Yes	We evaluate P3 and compare it with several state-of-the-art ofﬂine RL methods on the standard D4RL Gym benchmark (Fu et al., 2020).
Dataset Splits	Yes	We train an ensemble of N models and pick the best K models based on their prediction error on a hold-out set. ... D4RL Gym Datasets. D4RL is a widely-used benchmark for evaluating ofﬂine RL algorithms. It provides a variety of environments, tasks, and corresponding datasets...
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as CPU models, GPU models, or cloud computing instances.
Software Dependencies	No	The paper mentions software components like 'MLP', 'Adam', and 'Open AI s ES' but does not specify their version numbers or other library dependencies with versions needed for reproduction.
Experiment Setup	Yes	Table 3: Hyperparameters of environment model for D4RL Gym experiments. (e.g., Number of models/elites 7/5, Learning rate 10-4). Table 4: Hyperparameters of P3 for D4RL Gym experiments. (e.g., Policy network MLP(32, 32), Horizon length H 1000, Number of reference vectors n 5).