Inverse Reinforcement Learning with Locally Consistent Reward Functions
Authors: Quoc Phong Nguyen, Bryan Kian Hsiang Low, Patrick Jaillet
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation on synthetic and real-world datasets shows that our IRL algorithm outperforms the state-of-the-art EM clustering with maximum likelihood IRL, which is, interestingly, a reduced variant of our approach. |
| Researcher Affiliation | Academia | Dept. of Computer Science, National University of Singapore, Republic of Singapore Dept. of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, USA |
| Pseudocode | No | The paper describes the EM algorithm mathematically and textually but does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper uses 'synthetic and real-world datasets', including 'GPS traces of 59 taxis' provided by 'The Comfort taxi company in Singapore', but does not provide concrete access information (link, DOI, or formal citation with authors/year) for public access to these datasets. |
| Dataset Splits | No | The paper evaluates performance on the 'expert’s demonstrated trajectories' and uses Ntot as the total number of trajectories but does not specify any explicit train/validation/test dataset splits for model evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers with their versions) that were used for the experiments. |
| Experiment Setup | Yes | We set γ to 0.95 and the number |R| of reward functions of the agent to 2. To avoid local maxima in gradient ascent, we initialize our EM algorithm with 20 random 0 values and report the best result based on the Q value of EM (Section 3). |