Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees
Authors: Siliang Zeng, Chenliang Li, Alfredo Garcia, Mingyi Hong
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to demonstrate that the proposed algorithm outperforms many state-of-the-art IRL algorithms in both policy estimation and reward recovery. In particular, when transferring to a new environment, RL algorithms using rewards recovered by the proposed algorithm outperform those that use rewards recovered from existing IRL and imitation learning benchmarks. |
| Researcher Affiliation | Academia | Siliang Zeng University of Minnesota, Twin Cities Minneapolis, MN, USA EMAIL Chenliang Li The Chinese University of Hong Kong, Shenzhen, China EMAIL Alfredo Garcia Texas A&M University College Station, TX, USA EMAIL Mingyi Hong University of Minnesota, Twin Cities Minneapolis, MN, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Maximum Likelihood Inverse Reinforcement Learning (ML-IRL) |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | Yes | For the expert dataset, we use the data provided in the of๏ฌcial implementation2 of f-IRL. 2https://github.com/twni2016/f-IRL |
| Dataset Splits | No | The paper mentions 'hyperparameter settings and simulation details are provided in Appendix B', but the main text does not explicitly detail the dataset splits for training, validation, and testing. |
| Hardware Specification | No | The paper mentions the use of Mu Jo Co for robotics control tasks but does not specify any particular hardware (GPU/CPU models, memory) used for running the experiments in the main text. The checklist confirms hardware details are included, but this implies they are in an appendix not provided. |
| Software Dependencies | No | The paper mentions using 'soft Actor-Critic [22] as the base RL algorithm' but does not specify version numbers for any software components, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | The hyperparameter settings and simulation details are provided in Appendix B. |