Between Imitation and Intention Learning

Authors: James MacGlashan, Michael L. Littman

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present empirical results on multiple domains that demonstrate that performing IRL with a small, but non-zero, receding planning horizon greatly decreases the computational cost of planning while maintaining superior generalization performance compared to imitation learning.
Researcher Affiliation Academia James Mac Glashan Brown University james macglashan@brown.edu Michael L. Littman Brown University mlittman@cs.brown.edu
Pseudocode Yes Algorithm 1 Compute Value(s, h)
Open Source Code Yes We have also made RHIRL publically available as part of BURLAP1, an open source reinforcement learning and planning library. 1http://burlap.cs.brown.edu/
Open Datasets No The paper uses expert demonstrations generated by the authors for the navigation, mountain car, and lunar lander domains. It does not provide concrete access (link, DOI, specific citation to a public dataset) to these training demonstrations.
Dataset Splits No The paper mentions testing generalization performance on novel states but does not explicitly specify training/validation/test dataset splits or sample counts for validation data.
Hardware Specification No The paper mentions 'total training CPU time' but does not provide specific details about the hardware used, such as CPU models, GPU models, or memory.
Software Dependencies No The paper mentions using 'Weka s J48 classifier' and 'Weka s logistic regression implementation' but does not specify version numbers for Weka or the classifiers.
Experiment Setup Yes RHIRL used 10 steps of gradient ascent. (for navigation and lunar lander) and RHIRL used 15 steps of gradient ascent (for mountain car). To facilitate generalization, the learned reward function is a linear combination of both task features and agent-space features.