reproducibilityindex.ai

Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning

Authors: Yihao Sun, Jiaji Zhang, Chengxing Jia, Haoxin Lin, Junyin Ye, Yang Yu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically we have verified that our proposed uncertainty quantification can be significantly closer to the true Bellman error than the compared methods. Consequently, MOBILE outperforms prior offline RL approaches on most tasks of D4RL and Neo RL benchmarks.
Researcher Affiliation	Collaboration	1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, China 2Polixir Technologies, Nanjing, Jiangsu, China 3Peng Cheng Laboratory, Shenzhen, 518055, China.
Pseudocode	Yes	Algorithm 1 MOBILE
Open Source Code	Yes	The code is available at https://github.com/yihaosun1124/mobile.
Open Datasets	Yes	standard D4RL offline RL benchmark (Fu et al., 2020), which includes Gym and Adroit domains, as well as the near-real-world Neo RL (Qin et al., 2022) benchmark.
Dataset Splits	Yes	We train an ensemble of 7 such dynamics models following (Janner et al., 2019; Yu et al., 2020) and pick the best 5 models based on the validation prediction error on a held-out set that contains 1000 transitions in the offline dataset D.
Hardware Specification	Yes	All the experiments are run with a single Ge Force GTX 3070 GPU and an AMD Ryzen 5900X CPU at 4.8GHz.
Software Dependencies	No	The paper mentions 'SAC' and 'Adam Optimizers' but does not provide specific version numbers for any software dependencies or frameworks.
Experiment Setup	Yes	Table 3. Hyperparameters of Policy Optimization in MOBILE. ... K 2 ... Policy network FC(256,256) ... Q-network FC(256,256) ... τ 5e 3 ... γ 0.99 ... lr of actor 1e 4 ... lr of critic 3e 4 ... Batch size 256 ... Niter 3M.