reproducibilityindex.ai

Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning

Authors: Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct a series of experiments designed to answer the following questions: Q1: What is the appearance of the recovered dynamics reward? (Fig. 2, 5) Q2: Can the recovered dynamics reward enhance the accuracy of rollouts? (Fig. 3, 4, 6) Q3: Can MOREC facilitate learning policies with superior performance? (Table 1, 2, 3)
Researcher Affiliation	Collaboration	Fan-Ming Luo, Tian Xu, Xingchen Cao & Yang Yu National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China Polixir.ai {luofm,xut,caoxc,yuy}@lamda.nju.edu.cn
Pseudocode	Yes	Algorithm 1: MOREC-MOPO
Open Source Code	Yes	Code is available at https://github.com/polixir/morec.
Open Datasets	Yes	Additionally, we evaluate MOREC on 21 typical tasks from two offline benchmarks, D4RL (Fu et al., 2020) and Neo RL (Qin et al., 2022).
Dataset Splits	Yes	Importantly, prior to training, we partition our dataset into training and validation subsets. The model exclusively utilizes the training subset for parameter updates. For reference, the validation root mean square errors (RMSEs) are presented in Table 12.
Hardware Specification	Yes	All experiments were conducted on a workstation outfitted with an Intel Xeon Gold 5218R CPU, 4 NVIDIA RTX 3090 GPUs, and 250GB of RAM, running Ubuntu 20.04.
Software Dependencies	No	The paper mentions using 'Offline RL-Kit codebase (Sun, 2023)' and 'Optimizer Adam', but does not provide specific version numbers for multiple key software libraries or dependencies, only citing the original papers for algorithms or a year for the codebase.
Experiment Setup	Yes	The hyper-parameters for MOREC-MOPO and MOREC-MOBILE derive from the default parameters specified in MOPO and MOBILE in Offline RL-Kit. ... This adaptation leads to the consolidated hyper-parameters for both MOREC-MOPO and MOREC-MOBILE, as detailed in Table 5. ... The finalized hyper-parameter configurations for MOREC-MOPO can be found in Table 6. ... The definitive hyper-parameter settings for MOREC-MOBILE are detailed in Table 7.