Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

Authors: Siliang Zeng, Chenliang Li, Alfredo Garcia, Mingyi Hong

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to demonstrate that the proposed algorithm outperforms many state-of-the-art IRL algorithms in both policy estimation and reward recovery. In particular, when transferring to a new environment, RL algorithms using rewards recovered by the proposed algorithm outperform those that use rewards recovered from existing IRL and imitation learning benchmarks.
Researcher Affiliation Academia Siliang Zeng University of Minnesota, Twin Cities Minneapolis, MN, USA zeng0176@umn.edu Chenliang Li The Chinese University of Hong Kong, Shenzhen, China chenliangli@link.cuhk.edu.cn Alfredo Garcia Texas A&M University College Station, TX, USA alfredo.garcia@tamu.edu Mingyi Hong University of Minnesota, Twin Cities Minneapolis, MN, USA mhong@umn.edu
Pseudocode Yes Algorithm 1 Maximum Likelihood Inverse Reinforcement Learning (ML-IRL)
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]
Open Datasets Yes For the expert dataset, we use the data provided in the official implementation2 of f-IRL. 2https://github.com/twni2016/f-IRL
Dataset Splits No The paper mentions 'hyperparameter settings and simulation details are provided in Appendix B', but the main text does not explicitly detail the dataset splits for training, validation, and testing.
Hardware Specification No The paper mentions the use of Mu Jo Co for robotics control tasks but does not specify any particular hardware (GPU/CPU models, memory) used for running the experiments in the main text. The checklist confirms hardware details are included, but this implies they are in an appendix not provided.
Software Dependencies No The paper mentions using 'soft Actor-Critic [22] as the base RL algorithm' but does not specify version numbers for any software components, libraries, or frameworks used in the experiments.
Experiment Setup Yes The hyperparameter settings and simulation details are provided in Appendix B.