MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator
Authors: Xiao-Yin Liu, Xiao-Hu Zhou, Guotao Li, Hao Li, Mei-Jiang Gui, Tian-Yu Xiang, De-Xing Huang, Zeng-Guang Hou
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that MICRO outperforms prior RL algorithms in offline RL benchmark and is considerably robust to adversarial perturbations. In this paper, the theoretical and experimental results show that MICRO has the guarantee of a robust policy improvement and outperforms current state-of-the-art algorithms on the D4RL dataset benchmark. Furthermore, MICRO achieves better robustness on different adversarial attacks. |
| Researcher Affiliation | Academia | 1State Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China 2The School of Artificial Intelligence University of Chinese Academy of Sciences, Beijing 100049, China |
| Pseudocode | Yes | Algorithm 1 Model-based offline reinforcement learning with a conservative Bellman operator (MICRO) |
| Open Source Code | Yes | The code for MICRO is available at github.com/xiaoyinliu0714/MICRO. |
| Open Datasets | Yes | We answer the above questions using D4RL benchmark [Fu et al., 2020] with several control tasks and datasets. and the citation for D4RL: [Fu et al., 2020] Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning. ar Xiv preprint ar Xiv:2004.07219, 2020. |
| Dataset Splits | No | We answer the above questions using D4RL benchmark [Fu et al., 2020] with several control tasks and datasets. The paper mentions the D4RL benchmark but does not explicitly state the dataset splits (e.g., percentage or count for training, validation, or test sets) in the main text. |
| Hardware Specification | No | The simulation environment of D4RL tasks is based on the Mu Jo Co physics simulator. No specific hardware details (e.g., GPU/CPU models, memory) are provided for the experimental setup. |
| Software Dependencies | No | The simulation environment of D4RL tasks is based on the Mu Jo Co physics simulator. and The algorithm MICRO, presented in algorithm 1, is built on model-based policy optimization (MBPO) [Janner et al., 2019] and soft actor-critic (SAC) [Haarnoja et al., 2018]. The paper does not provide specific version numbers for software dependencies or libraries used for the experiments. |
| Experiment Setup | Yes | The algorithm MICRO, presented in algorithm 1, is built on model-based policy optimization (MBPO) [Janner et al., 2019] and soft actor-critic (SAC) [Haarnoja et al., 2018]. The MLE method is used to train N ensemble dynamics models {T i φ = N(µi φ, σi φ)}N i=1. the model data is collected through h-step rollouts, α is the entropy regularization coefficient, Qωi denotes the i-th Q-function, and K is the number of critic. the coefficient β is introduced in Eq. (6) to adjust the value. It also mentions More hyperparameters and implementation details are provided in Appendix B. |