Between Imitation and Intention Learning
Authors: James MacGlashan, Michael L. Littman
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical results on multiple domains that demonstrate that performing IRL with a small, but non-zero, receding planning horizon greatly decreases the computational cost of planning while maintaining superior generalization performance compared to imitation learning. |
| Researcher Affiliation | Academia | James Mac Glashan Brown University james macglashan@brown.edu Michael L. Littman Brown University mlittman@cs.brown.edu |
| Pseudocode | Yes | Algorithm 1 Compute Value(s, h) |
| Open Source Code | Yes | We have also made RHIRL publically available as part of BURLAP1, an open source reinforcement learning and planning library. 1http://burlap.cs.brown.edu/ |
| Open Datasets | No | The paper uses expert demonstrations generated by the authors for the navigation, mountain car, and lunar lander domains. It does not provide concrete access (link, DOI, specific citation to a public dataset) to these training demonstrations. |
| Dataset Splits | No | The paper mentions testing generalization performance on novel states but does not explicitly specify training/validation/test dataset splits or sample counts for validation data. |
| Hardware Specification | No | The paper mentions 'total training CPU time' but does not provide specific details about the hardware used, such as CPU models, GPU models, or memory. |
| Software Dependencies | No | The paper mentions using 'Weka s J48 classifier' and 'Weka s logistic regression implementation' but does not specify version numbers for Weka or the classifiers. |
| Experiment Setup | Yes | RHIRL used 10 steps of gradient ascent. (for navigation and lunar lander) and RHIRL used 15 steps of gradient ascent (for mountain car). To facilitate generalization, the learned reward function is a linear combination of both task features and agent-space features. |