MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator

Authors: Xiao-Yin Liu, Xiao-Hu Zhou, Guotao Li, Hao Li, Mei-Jiang Gui, Tian-Yu Xiang, De-Xing Huang, Zeng-Guang Hou

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that MICRO outperforms prior RL algorithms in offline RL benchmark and is considerably robust to adversarial perturbations. In this paper, the theoretical and experimental results show that MICRO has the guarantee of a robust policy improvement and outperforms current state-of-the-art algorithms on the D4RL dataset benchmark. Furthermore, MICRO achieves better robustness on different adversarial attacks.
Researcher Affiliation Academia 1State Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China 2The School of Artificial Intelligence University of Chinese Academy of Sciences, Beijing 100049, China
Pseudocode Yes Algorithm 1 Model-based offline reinforcement learning with a conservative Bellman operator (MICRO)
Open Source Code Yes The code for MICRO is available at github.com/xiaoyinliu0714/MICRO.
Open Datasets Yes We answer the above questions using D4RL benchmark [Fu et al., 2020] with several control tasks and datasets. and the citation for D4RL: [Fu et al., 2020] Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning. ar Xiv preprint ar Xiv:2004.07219, 2020.
Dataset Splits No We answer the above questions using D4RL benchmark [Fu et al., 2020] with several control tasks and datasets. The paper mentions the D4RL benchmark but does not explicitly state the dataset splits (e.g., percentage or count for training, validation, or test sets) in the main text.
Hardware Specification No The simulation environment of D4RL tasks is based on the Mu Jo Co physics simulator. No specific hardware details (e.g., GPU/CPU models, memory) are provided for the experimental setup.
Software Dependencies No The simulation environment of D4RL tasks is based on the Mu Jo Co physics simulator. and The algorithm MICRO, presented in algorithm 1, is built on model-based policy optimization (MBPO) [Janner et al., 2019] and soft actor-critic (SAC) [Haarnoja et al., 2018]. The paper does not provide specific version numbers for software dependencies or libraries used for the experiments.
Experiment Setup Yes The algorithm MICRO, presented in algorithm 1, is built on model-based policy optimization (MBPO) [Janner et al., 2019] and soft actor-critic (SAC) [Haarnoja et al., 2018]. The MLE method is used to train N ensemble dynamics models {T i φ = N(µi φ, σi φ)}N i=1. the model data is collected through h-step rollouts, α is the entropy regularization coefficient, Qωi denotes the i-th Q-function, and K is the number of critic. the coefficient β is introduced in Eq. (6) to adjust the value. It also mentions More hyperparameters and implementation details are provided in Appendix B.