Receding Horizon Inverse Reinforcement Learning
Authors: Yiqing Xu, Wei Gao, David Hsu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on benchmark tasks show that RHIRL outperforms several leading IRL algorithms in most instances. We investigate two main questions. Does RHIRL scale up to high-dimensional continuous control tasks? Does RHIRL learn a robust cost function under noise? We compare RHIRL with two leading IRL algorithms, namely AIRL [8] and f-IRL [27], and one imitation learning algorithm, GAIL [13]. Our benchmark set consists of six continuous control tasks (Figure 2) from Open AI Gym [2]. Table 1: Performance comparison of RHIRL and other methods. |
| Researcher Affiliation | Academia | Yiqing Xu1 Wei Gao1 David Hsu1,2 1School of Computing 2Smart Systems Insitute National University of Singapore {xuyiqing,gaowei90,dyhsu}@comp.nus.edu.sg |
| Pseudocode | Yes | Algorithm 1 RHIRL |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] , code is included in the supplementary material. |
| Open Datasets | No | The paper states: "We use World Model [11] to generate expert demonstration data for Car Racing and use SAC [12] for the other tasks. We add Gaussian noise to the input controls and collect expert demonstrations at different control noise levels." While the environments (Open AI Gym) are public, the specific expert demonstration datasets collected for this paper are not stated to be publicly available with a direct link, DOI, or repository. |
| Dataset Splits | No | The paper describes how expert demonstrations are generated and used for learning, but it does not specify explicit training, validation, or testing splits for these collected demonstrations. The evaluation is focused on the performance of the induced policy. |
| Hardware Specification | Yes | All experiments are conducted on a single machine with Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz, 128GB of RAM, and 4 NVIDIA GeForce RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions using "the implementation of AIRL, GAIL, and f-IRL from the f-IRL s official repository along with the reported hyperparameters [27]", but does not provide specific version numbers for software dependencies such as Python, PyTorch, TensorFlow, or other libraries. |
| Experiment Setup | Yes | We use the implementation of AIRL, GAIL, and f-IRL from the f-IRL s official repository along with the reported hyperparameters [27], whenever possible. We also perform hyperparameter search on a grid to optimize the performance of every method on every task. The specific hyperparameter settings used are reported in Appendix D.2. |