Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning

Authors: Yihao Sun, Jiaji Zhang, Chengxing Jia, Haoxin Lin, Junyin Ye, Yang Yu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically we have verified that our proposed uncertainty quantification can be significantly closer to the true Bellman error than the compared methods. Consequently, MOBILE outperforms prior offline RL approaches on most tasks of D4RL and Neo RL benchmarks.
Researcher Affiliation Collaboration 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, China 2Polixir Technologies, Nanjing, Jiangsu, China 3Peng Cheng Laboratory, Shenzhen, 518055, China.
Pseudocode Yes Algorithm 1 MOBILE
Open Source Code Yes The code is available at https://github.com/yihaosun1124/mobile.
Open Datasets Yes standard D4RL offline RL benchmark (Fu et al., 2020), which includes Gym and Adroit domains, as well as the near-real-world Neo RL (Qin et al., 2022) benchmark.
Dataset Splits Yes We train an ensemble of 7 such dynamics models following (Janner et al., 2019; Yu et al., 2020) and pick the best 5 models based on the validation prediction error on a held-out set that contains 1000 transitions in the offline dataset D.
Hardware Specification Yes All the experiments are run with a single Ge Force GTX 3070 GPU and an AMD Ryzen 5900X CPU at 4.8GHz.
Software Dependencies No The paper mentions 'SAC' and 'Adam Optimizers' but does not provide specific version numbers for any software dependencies or frameworks.
Experiment Setup Yes Table 3. Hyperparameters of Policy Optimization in MOBILE. ... K 2 ... Policy network FC(256,256) ... Q-network FC(256,256) ... τ 5e 3 ... γ 0.99 ... lr of actor 1e 4 ... lr of critic 3e 4 ... Batch size 256 ... Niter 3M.