Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning

Authors: Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct a series of experiments designed to answer the following questions: Q1: What is the appearance of the recovered dynamics reward? (Fig. 2, 5) Q2: Can the recovered dynamics reward enhance the accuracy of rollouts? (Fig. 3, 4, 6) Q3: Can MOREC facilitate learning policies with superior performance? (Table 1, 2, 3)
Researcher Affiliation Collaboration Fan-Ming Luo, Tian Xu, Xingchen Cao & Yang Yu National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China Polixir.ai {luofm,xut,caoxc,yuy}@lamda.nju.edu.cn
Pseudocode Yes Algorithm 1: MOREC-MOPO
Open Source Code Yes Code is available at https://github.com/polixir/morec.
Open Datasets Yes Additionally, we evaluate MOREC on 21 typical tasks from two offline benchmarks, D4RL (Fu et al., 2020) and Neo RL (Qin et al., 2022).
Dataset Splits Yes Importantly, prior to training, we partition our dataset into training and validation subsets. The model exclusively utilizes the training subset for parameter updates. For reference, the validation root mean square errors (RMSEs) are presented in Table 12.
Hardware Specification Yes All experiments were conducted on a workstation outfitted with an Intel Xeon Gold 5218R CPU, 4 NVIDIA RTX 3090 GPUs, and 250GB of RAM, running Ubuntu 20.04.
Software Dependencies No The paper mentions using 'Offline RL-Kit codebase (Sun, 2023)' and 'Optimizer Adam', but does not provide specific version numbers for multiple key software libraries or dependencies, only citing the original papers for algorithms or a year for the codebase.
Experiment Setup Yes The hyper-parameters for MOREC-MOPO and MOREC-MOBILE derive from the default parameters specified in MOPO and MOBILE in Offline RL-Kit. ... This adaptation leads to the consolidated hyper-parameters for both MOREC-MOPO and MOREC-MOBILE, as detailed in Table 5. ... The finalized hyper-parameter configurations for MOREC-MOPO can be found in Table 6. ... The definitive hyper-parameter settings for MOREC-MOBILE are detailed in Table 7.