Planning with Abstract Learned Models While Learning Transferable Subtasks
Authors: John Winder, Stephanie Milani, Matthew Landen, Erebus Oh, Shane Parr, Shawn Squire, Marie desJardins, Cynthia Matuszek9992-10000
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experimental Methodology |
| Researcher Affiliation | Academia | John Winder,1 Stephanie Milani,2 Matthew Landen,3 Erebus Oh,1 Shane Parr,4 Shawn Squire,1 Marie des Jardins,5 and Cynthia Matuszek1 1University of Maryland, Baltimore County, 2Carnegie Mellon University, 3Georgia Institute of Technology, 4University of Massachusetts Amherst, 5Simmons University |
| Pseudocode | Yes | Algorithm 1 Planning with Abstract Learned Models |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing the code or links to a code repository. |
| Open Datasets | Yes | The Taxi domain (Dietterich 2000) is a common HRL problem... The Cleanup domain simulates a robot that tidies a house by putting blocks where they belong, similar to the game of Sokoban (Mac Glashan et al. 2015; Guez et al. 2019). |
| Dataset Splits | No | The paper does not explicitly provide specific percentages, sample counts, or citations to predefined splits for training, validation, and test datasets. It mentions using established domains and grounding to new, random target MDPs for trials. |
| Hardware Specification | Yes | Performed on i7-4790K CPU @ 4.00 GHz, 20GB of RAM. |
| Software Dependencies | No | The paper mentions using 'Value Iteration as the planner and R-MAX as the model-based RL algorithm' but does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | No | The paper describes the domains (Taxi, Cleanup) and the types of hierarchies used (expert, learned, amended), and mentions that Value Iteration and R-MAX were used as algorithms. However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations. |