reproducibilityindex.ai

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

Authors: Siliang Zeng, Chenliang Li, Alfredo Garcia, Mingyi Hong

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments to demonstrate that the proposed algorithm outperforms many state-of-the-art IRL algorithms in both policy estimation and reward recovery. In particular, when transferring to a new environment, RL algorithms using rewards recovered by the proposed algorithm outperform those that use rewards recovered from existing IRL and imitation learning benchmarks.
Researcher Affiliation	Academia	Siliang Zeng University of Minnesota, Twin Cities Minneapolis, MN, USA zeng0176@umn.edu Chenliang Li The Chinese University of Hong Kong, Shenzhen, China chenliangli@link.cuhk.edu.cn Alfredo Garcia Texas A&M University College Station, TX, USA alfredo.garcia@tamu.edu Mingyi Hong University of Minnesota, Twin Cities Minneapolis, MN, USA mhong@umn.edu
Pseudocode	Yes	Algorithm 1 Maximum Likelihood Inverse Reinforcement Learning (ML-IRL)
Open Source Code	No	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]
Open Datasets	Yes	For the expert dataset, we use the data provided in the ofﬁcial implementation2 of f-IRL. 2https://github.com/twni2016/f-IRL
Dataset Splits	No	The paper mentions 'hyperparameter settings and simulation details are provided in Appendix B', but the main text does not explicitly detail the dataset splits for training, validation, and testing.
Hardware Specification	No	The paper mentions the use of Mu Jo Co for robotics control tasks but does not specify any particular hardware (GPU/CPU models, memory) used for running the experiments in the main text. The checklist confirms hardware details are included, but this implies they are in an appendix not provided.
Software Dependencies	No	The paper mentions using 'soft Actor-Critic [22] as the base RL algorithm' but does not specify version numbers for any software components, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	The hyperparameter settings and simulation details are provided in Appendix B.