Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Inverse Reinforcement Learning with Locally Consistent Reward Functions
Authors: Quoc Phong Nguyen, Bryan Kian Hsiang Low, Patrick Jaillet
NeurIPS 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation on synthetic and real-world datasets shows that our IRL algorithm outperforms the state-of-the-art EM clustering with maximum likelihood IRL, which is, interestingly, a reduced variant of our approach. |
| Researcher Affiliation | Academia | Dept. of Computer Science, National University of Singapore, Republic of Singapore Dept. of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, USA |
| Pseudocode | No | The paper describes the EM algorithm mathematically and textually but does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper uses 'synthetic and real-world datasets', including 'GPS traces of 59 taxis' provided by 'The Comfort taxi company in Singapore', but does not provide concrete access information (link, DOI, or formal citation with authors/year) for public access to these datasets. |
| Dataset Splits | No | The paper evaluates performance on the 'expertโs demonstrated trajectories' and uses Ntot as the total number of trajectories but does not specify any explicit train/validation/test dataset splits for model evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers with their versions) that were used for the experiments. |
| Experiment Setup | Yes | We set ฮณ to 0.95 and the number |R| of reward functions of the agent to 2. To avoid local maxima in gradient ascent, we initialize our EM algorithm with 20 random 0 values and report the best result based on the Q value of EM (Section 3). |