CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning
Authors: Sheng Yue, Guanbo Wang, Wei Shao, Zhaofeng Zhang, Sen Lin, Ju Ren, Junshan Zhang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments corroborate the significant performance gains of CLARE over existing state-of-the-art algorithms on Mu Jo Co continuous control tasks (especially with a small offline dataset), and the learned reward is highly instructive for further learning (source code). |
| Researcher Affiliation | Academia | Sheng Yue1 , Guanbo Wang2, Wei Shao3 , Zhaofeng Zhang4, Sen Lin5 , Ju Ren1,6 , Junshan Zhang3 1Tsinghua University, 2Tongji University, 3University of California, Davis, 4Arizona State University, 5Ohio State University, 6Zhongguancun Laboratory |
| Pseudocode | Yes | Algorithm 1: Conservative model-based reward learning (CLARE) |
| Open Source Code | Yes | Extensive experiments corroborate the significant performance gains of CLARE over existing state-of-the-art algorithms on Mu Jo Co continuous control tasks (especially with a small offline dataset), and the learned reward is highly instructive for further learning (source code)." and "Our implementation is built upon the open source framework of offline RL algorithms, provided at: https://github.com/polixir/Offline RL |
| Open Datasets | Yes | we compare CLARE with the following existing offline IRL methods on the D4RL benchmark (Fu et al., 2020)" and "the D4RL dataset provided at: https: //github.com/rail-berkeley/d4rl (under the Apache License 2.0). |
| Dataset Splits | No | The paper mentions picking dynamics models based on 'validation prediction error on a held-out set' but does not explicitly provide the train/validation/test splits (percentages, counts, or explicit standard splits) for the main D4RL datasets used in their experiments. |
| Hardware Specification | Yes | We implement the code in Py Torch 1.11.0 on a server with a 32-Cores AMD Ryzen Threadripper PRO 3975WX and a Intel Ge Forch RTX 3090 Ti. |
| Software Dependencies | Yes | We implement the code in Py Torch 1.11.0 |
| Experiment Setup | Yes | Appendix A.2 HYPERPARAMETERS" and "Table 2: Hyperparameters for CLARE." which lists specific values for learning rates, batchsize, horizon, regularization weight, discount factor, and number of steps/epochs. |