Hierarchy Through Composition with Multitask LMDPs

Authors: Andrew M. Saxe, Adam C. Earle, Benjamin Rosman

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments
Researcher Affiliation Academia 1Center for Brain Science, Harvard University 2School of Computer Science and Applied Mathematics, University of the Witwatersrand 3Council for Scientific and Industrial Research, South Africa.
Pseudocode Yes We now describe the execution model (see Supplementary Material for pseudocode listing).
Open Source Code No The paper mentions 'see Supplementary Material for pseudocode listing' but does not explicitly state that source code for the methodology is openly available or provide a link.
Open Datasets Yes To illustrate the operation of our scheme, we apply it to a 2D grid-world rooms domain (Fig. 3(a)). This domain is shown in Fig. 5, and is taken from previous experiments in transfer learning (Fernández & Veloso, 2006).
Dataset Splits No The paper describes how the agent is initialized and trajectory lengths are capped, but it does not specify dataset splits for training, validation, and testing with percentages or sample counts.
Hardware Specification No The paper does not mention any specific hardware (e.g., GPU/CPU models, processor types, memory) used for running its experiments.
Software Dependencies No The paper mentions learning algorithms like Q-learning but does not provide specific software names with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes In the options framework the agent’s action space is augmented with the full set of option policies. The initialization set for these options is the full state space, so that any option may be executed at any time. The termination condition is defined such that the option terminates only when it reaches its goal state. To minimize the action space for the options agent, we remove the primitive actions, reducing the learning problem to simply choosing the single correct option from each state. The options learning problem is solved using Q-learning with sigmoidal learning rate decrease and ϵ-greedy exploitation. These parameters were optimized on a coarse grid to yield the fastest learning curves.