Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees
Authors: Siliang Zeng, Chenliang Li, Alfredo Garcia, Mingyi Hong
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to demonstrate that the proposed algorithm outperforms many state-of-the-art IRL algorithms in both policy estimation and reward recovery. In particular, when transferring to a new environment, RL algorithms using rewards recovered by the proposed algorithm outperform those that use rewards recovered from existing IRL and imitation learning benchmarks. |
| Researcher Affiliation | Academia | Siliang Zeng University of Minnesota, Twin Cities Minneapolis, MN, USA zeng0176@umn.edu Chenliang Li The Chinese University of Hong Kong, Shenzhen, China chenliangli@link.cuhk.edu.cn Alfredo Garcia Texas A&M University College Station, TX, USA alfredo.garcia@tamu.edu Mingyi Hong University of Minnesota, Twin Cities Minneapolis, MN, USA mhong@umn.edu |
| Pseudocode | Yes | Algorithm 1 Maximum Likelihood Inverse Reinforcement Learning (ML-IRL) |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | Yes | For the expert dataset, we use the data provided in the official implementation2 of f-IRL. 2https://github.com/twni2016/f-IRL |
| Dataset Splits | No | The paper mentions 'hyperparameter settings and simulation details are provided in Appendix B', but the main text does not explicitly detail the dataset splits for training, validation, and testing. |
| Hardware Specification | No | The paper mentions the use of Mu Jo Co for robotics control tasks but does not specify any particular hardware (GPU/CPU models, memory) used for running the experiments in the main text. The checklist confirms hardware details are included, but this implies they are in an appendix not provided. |
| Software Dependencies | No | The paper mentions using 'soft Actor-Critic [22] as the base RL algorithm' but does not specify version numbers for any software components, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | The hyperparameter settings and simulation details are provided in Appendix B. |