CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning

Authors: Sheng Yue, Guanbo Wang, Wei Shao, Zhaofeng Zhang, Sen Lin, Ju Ren, Junshan Zhang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments corroborate the significant performance gains of CLARE over existing state-of-the-art algorithms on Mu Jo Co continuous control tasks (especially with a small offline dataset), and the learned reward is highly instructive for further learning (source code).
Researcher Affiliation Academia Sheng Yue1 , Guanbo Wang2, Wei Shao3 , Zhaofeng Zhang4, Sen Lin5 , Ju Ren1,6 , Junshan Zhang3 1Tsinghua University, 2Tongji University, 3University of California, Davis, 4Arizona State University, 5Ohio State University, 6Zhongguancun Laboratory
Pseudocode Yes Algorithm 1: Conservative model-based reward learning (CLARE)
Open Source Code Yes Extensive experiments corroborate the significant performance gains of CLARE over existing state-of-the-art algorithms on Mu Jo Co continuous control tasks (especially with a small offline dataset), and the learned reward is highly instructive for further learning (source code)." and "Our implementation is built upon the open source framework of offline RL algorithms, provided at: https://github.com/polixir/Offline RL
Open Datasets Yes we compare CLARE with the following existing offline IRL methods on the D4RL benchmark (Fu et al., 2020)" and "the D4RL dataset provided at: https: //github.com/rail-berkeley/d4rl (under the Apache License 2.0).
Dataset Splits No The paper mentions picking dynamics models based on 'validation prediction error on a held-out set' but does not explicitly provide the train/validation/test splits (percentages, counts, or explicit standard splits) for the main D4RL datasets used in their experiments.
Hardware Specification Yes We implement the code in Py Torch 1.11.0 on a server with a 32-Cores AMD Ryzen Threadripper PRO 3975WX and a Intel Ge Forch RTX 3090 Ti.
Software Dependencies Yes We implement the code in Py Torch 1.11.0
Experiment Setup Yes Appendix A.2 HYPERPARAMETERS" and "Table 2: Hyperparameters for CLARE." which lists specific values for learning rates, batchsize, horizon, regularization weight, discount factor, and number of steps/epochs.